4 out of 5 dentists recommend this site

An interesting Putnam problem on the Pigeonhole Principle

The following problem is contained in the book “Putnam and Beyond” by Gelca, and I saw it on stackexchange. I’m mainly recording this solution because it took me longer than usual to come up with the solution, as I was led down the wrong path many a time. Noting what is sufficient for a block of numbers to be a square is the main challenge in solving this problem.

Let there be a sequence of m terms, all of which belong to a set of n natural numbers. Prove that if 2^n\leq m, then there exists a block of consecutive terms, the product of which is a square number.

Let the n numbers be \{a_1,\dots,a_n\}, and consider the function f(k)=( a tuple of 0‘s and 1‘s), where the 0‘s and 1‘s denote the number of times \pmod 2 that each element a_i has appeared from the 1st to the kth element of the sequence of positive integers.

So f(1)=(1 somewhere, and the rest of the terms are 0), etc.

Clearly, if f(k)=(0,0,\dots,0) for any k, then the consecutive sequence of numbers from the 1st term to the kth terms is a square. If no f(k) is (0,0,0\dots,0), then there are 2^m-1 such tuples, and at least 2^m values of k. Hence, two of them must be equal. Let us suppose that f(k_1)=f(k_2). Then the sequence of terms from k_1 until k_2 is a square. Hence proved.

Proving that the first two and last two indices of the Riemann curvature tensor commute

I’ve always been confused with the combinatorial aspect of proving the properties of the Riemann curvature tensor. I want to record my proof of the fact that R(X,Y,Z,W)=R(Z,W,X,Y). This is different from the standard proof given in books. I have been unable to prove this theorem in the past, and hence am happy to write down my proof finally.

Define the function f(R(X,Y,Z,W))=R(X,Y,Z,W)-R(Z,W,X,Y). We want to prove that this function is 0.

By simple usage of the facts that R(X,Y,Z,W)+R(Y,Z,X,W)+(R(Z,X,Y,W)=0 and that switching the first two or last two vector fields gives us a negative sign, we can see that


Hence, f(R((X,Y,Z,W))=f(R(X,W,Y,Z))
Now note that R(X,Y,Z,W)=R(Y,X,W,Z). This is obtained by switching the first two and last two indices. However,


As f(R(X,Y,Z,W))= both positive and negative f(R(X,W,Y,Z)), we can conclude that it is 0.

Hence, R(X,Y,Z,W)=R(Z,W,X,Y).

It is not easy to prove this theorem because just manipulating the indices mindlessly (or even with some gameplan) can lead you down a rabbithole without ever reaching a conclusion. Meta-observations, like the above, are necessary to prove this assertion.

(More or less) Effective Altruism- March and April

In March, I donated $250 to EA

Screen Shot 2020-03-13 at 9.53.46 AM

In April, I decided to donate $250 instead to the Association for India’s Development to fight coronavirus in India:

Screen Shot 2020-04-07 at 9.19.23 AM

Thinking about a notorious Putnam problem

Consider the following Putnam question from the 2018 exam:

Consider a smooth function f:\Bbb{R}\to\Bbb{R} such that f\geq 0, and f(0)=0 and f(1)=1. Prove that there exists a point x and a positive integer n such that f^{(n)}(x)<0.

This is a problem from the 2018 Putnam, and only 10 students were able to solve it completely, making it the hardest question on the exam. I spent a day thinking about it, and my “proof” differs a lot from the official solutions, and is really a heuristic.

Proof: Assume that there does not exist any x and n such that f^{(n)}(x)<0. We will compare f with functions of the form x^n in [0,1]. We will prove that f\leq x^n on [0,1]. Because x^n\to 0 on [0,1) as n\to\infty, we will have proven that f=0 on [0,1) and f(1)=1. Hence, f cannot be smooth.

Why is f\leq x^n? Let us first analyze what f looks like. It is easy to see that f(x)=0 for x\leq 0. This is because as f\geq 0, if f(x)>0 for x<0, when f will have to decrease to 0 at x=0. Hence, there will be a negative derivative involved, which is a contradiction. Hence, f(x)=0 for x\leq 0, and by continuity of derivatives for smooth functions, all derivatives at x=0 are also 0.

Now consider the functions x^n, which are 0 at x=0 and 1 at x=1. These are the same endpoints for f(x) in [0,1]. If f(x) ever crosses x^n in [0,1), then it will have a higher nth derivative than x^n at the point of intersection. As its (n+1)th derivative is also non-negative, f will just keep shooting above x^n, and hence never “return” to x^n at x=1. This contradicts the fact that f(1)=1. Hence, f will always be bounded above by x^n in [0,1]. As this is true for all n, f=0 on [0,1) and f(1)=1. This contradicts the fact that f is continuous.

Putnam 2010

The Putnam exam is one of the hardest and most prestigious mathematical exams. Every year, more than 4,000 students, including math olympiad medalists from various countries, attempt the exam. The median score, almost every year, is 0. Each correctly answered question is worth 10 points

I often find myself trying to solve old Putnam problems, mainly because it serves as a way of “mathematical” procrastination- something that detracts me from my actual job of trying to solve my research problem. Yesterday, I ended up solving most questions from the 2010 Putnam paper. This is probably due to the fact that 2010 was one of the easiest papers in the recent past. When I compared my solutions with the solutions online, they turned out to be pretty different. Hence, I am recording them here.

The problems can be found here

A1: We need to find the largest number of boxes. Clearly, we need to place as few elements as possible in each box. If n is odd, then assume that we can place the largest element n in a box by itself. All the other boxes contain elements of the form \{i,n-i\}. Hence, we can have \frac{n+1}{2} boxes. Why can we not have an even larger number of boxes? Let a be the lowest number of elements in a box. If a=1, then the configuration above is the only possibility. If a\geq 2, then there are at most \frac{n-1}{2} boxes, which is lower than the number of boxes we have obtained above (\frac{n+1}{2}).

If n is even, then one such configuration is \{i,n+1-i\}. Using an argument similar to that given above, we can conclude that the highest number of boxes possible is \frac{n}{2}.

A2: First we prove that the derivative of the function above is periodic with period 1. For some x, we know that


Hence, f'(x+1)=\frac{f(x+2)-f(x+1)}{1}=f'(x). This way, we have proven that the derivative of f is periodic with period 1.

What this shows is it that the function repeats the same “shape” every time we move to the right or left by 1. If f is not a straight line, then there will be two points in [0,1] with different derivatives. Let us also assume without loss of generality that f(1)>f(0). Hence, f'(0)=\frac{f(1)-f(0)}{1} is positive. Let p\in[0,1] be the point with a derivative that is different from f'(0). However, f'(p)=\frac{f(p+1)-f(p)}{1} will be the same as f'(0) because f(p+1)-f(p)=f(1)-f(0). This is because the function repeats its shape with a period of 1. Hence,

f(p+1)-f(p)=\int_p^{p+1}{f'(x)} dx=\int_p^1 f'(x)dx+\int _0^p f'(x)dx=f(1)-f(0).

This contradicts the fact that f'(p)\neq f'(0).

Therefore, as the derivative at all points is the same, straight lines are the only solutions.

A3: Consider the curve \gamma: t\to (at,bt). Clearly, a\frac{dh}{dx}+b\frac{dh}{dy}=\frac{\partial}{\partial t}h(\gamma(t)). Now as h(\gamma(t))=\frac{\partial}{\partial t}h(\gamma(t)), clearly h(\gamma(t) is exponential, at least along \gamma. That contradicts the fact that h is bounded.

A4: Let n=2^ab, where b is not a power of 2. Then we claim that 10^{10^{10^n}}+10^{10^n}+10^n-1 is divisible by 10^{2^a}+1.

Proof: 10^n\equiv -1\pmod {10^{2^a}+1}.

Now 10^{10^n}=10^{10^{2^a b}}=10^{2^{2^a b}5^{2^a b}}=(10^{2^a})^{2^{2^ab-a}5^{2^ab}}. Hence, it is \equiv 1 \pmod {10^{2^a}+1}.

Similarly, 10^{10^{10^n}}\equiv 1\pmod {10^{2^a}+1}. Adding the residues up, we see that 10^{10^{10^n}}+10^{10^n}+10^n-1\equiv 0\pmod{10^{2^a}+1}

B1: Let us assume that this is true. Then we have a_1^2+a_2^2+\dots+a_n^2+\dots=2. If any of these elements, say a_i, is greater than 1, then a_i^{2n} will be much greater than 2n for large enough n. Hence, this contradicts the fact that a_1^{2n}+\dots+a_i^{2n}+\dots=2n. Therefore, all the a_i‘s are less than 1. The a_i‘s that are exactly equal to \{\pm 1\}, we can remove by subtracting them from the left hand side. Now for these numbers that are less than 1, as we increase n, \sum a_i^n is a decreasing sequence. However, the right hand side, which is n, is an increasing sequence. This is a contradiction. Therefore, no such sequence exists.

B2: The smallest such length is 3. Let A=(0,0). If the smallest length is 1, then without loss of generality let B=(1,0). Note that if C=(x,y), then both \sqrt{x^2+y^2} and \sqrt{(x-1)^2+y^2} need to be integers. This is impossible. Similarly, if the smallest length is 2, then without loss of generality, we can assume that B=(-2,0). Then if C=(x,y), then both \sqrt{x^2+y^2} and \sqrt{(x-2)^2+y^2} need to be integers. This can also be seen to be impossible, by a simple case by case analysis. Note that (n+1)^2-n^2=2n+1. Hence, the difference between \sqrt{x^2+y^2} and \sqrt{(x+2)^2+y^2} must be at least twice the smaller number plus 1. Without loss of generality, we can assume that x,y\geq 1 (y\neq 0 as the three points are not collinear). This leaves only a couple of possibilities for x,y which can be checked by hand to not be possible. Hence the minimum length is 3, which gives us the regular pythagorean triangle with sides 3,4,5. The coordinate of the points will be (0,0), (3,0) and (0,4).

B3: This is indeed a fantastic problem, with a simple solution: first transfer all the balls to B_1, and then transfer them one by one to the other baskets. As we can only draw i balls from the B_ith basket, we need to ensure that at least one basket B_i is such that it has i balls. Otherwise, no balls can be moved from the initial configuration. Hence, the minimal number of balls is 0+1+\dots+2009+1=\frac{2009\times 2010}{2}+1. Hence, the minimal value of n is the upper floor value of this number divided by 2010, which is 1005.

B5: This is a pretty tricky problem. The way that I finally thought through is is to visualize what f(f(x)) would look like for an increasing function f(x). If f(x) is always positive, then f(f(x)) for x\to-\infty would be within an \epsilon distance of a fixed positive number, which is a contradiction as f'(x) gets arbitrarily close to 0 for x\to -\infty. Now we shall allow f(x) to be negative. If f(x) is not bounded below, then there will exist negative x such that f(f(x)) will be negative. This contradicts the fact that f'(x) is always non-negative. Now consider f(x) which is bounded below, but can still be negative. Clearly f(x) has to become positive at some point, as f'(x) must be positive. If it becomes positive only when x>0, then f(f(x)) for x<0 will still be negative, which is a contradiction as f'(x)=f(f(x)) must be non-negative. Let -a (a is assumed to be positive here) be \lim\limits_{x\to-\infty} f(x). If f(x) becomes positive when x>-a, then f(f(x)) for x\to-\infty will be negative, which is again a contradiction. Hence, f(x) has to become positive for x<-a. Now if f(-a)>0, then f(f(x)) for x\to -\infty will be arbitrarily close to a fixed positive number, which is a contradiction as f'(x) for x\to -\infty approaches 0. Hence, f(a) must be 0.

Hence, the only remaining possibility is a function f(x) such that \lim\limits_{x\to -\infty} f(x)=-a and f(-a)=0. Let us analyze this function. We know that f'(-a)=f(0). Also, for x\in(-a,0], f'(x)>f(0). Hence, f(0)>af'(-a). This implies that a<1. I haven’t yet determined how to eliminate this case.

Lie derivatives: a simple idea behind a messy calculation

I want to write about Lie derivatives. Because finding good proofs for Lie derivatives in books and on the internet is a lost cause. Because they have caused me a world of pain. Because we could all do with less pain.

In all that is written below, we assume that all Lie derivatives are being found in the direction of the vector field X, and that \phi_t is the flow along this vector field.

What is a Lie Derivative? A Lie derivative is a derivative, but for things more complicated than functions. Basically, for a given vector field X, T(p+tX)=f(p)+tL_XT (roughly speaking). Here T is a tensor, and a function is a special case of a tensor:

f(p+tX)=f(p)+tL_Xf=f(p)+tD_Xf, where D_Xf=L_Xf.

How do we then find a workable definition for a Lie derivative? Let us calculate the Lie derivative of a vector field.

Let me first try and explain what I will try and do. I know how functions and their derivatives behave. f(p+tX)\approx f(p)+ t(derivative of f in the X direction). We will use this simple derivative rule of f, wherever we can, to find out the what the Lie derivative of a vector field is.

For a vector field

Y, we have Y(p+tX)=Y(p)+t(L_XY)(p).

This, however does not make sense, as Y(p)+t(L_XY)(p) is a vector field at p, while Y(p+tX) is a vector field at p+tX. Hence, the correct definition should be


This is equivalent to the following definition:


This, in turn, is equivalent to the formulation


We shall now try to simplify these terms.


to first order in t. Similarly,

(\phi)^{-t}_*(Y(p+tX))(f)=Y(p+tX)(f\circ \phi^{-t}_*)=Y(p+tX)(f(p+tX)-tX(f))

Adding these two terms together, we get (L_XY(p))(f)=X(Y(f))-Y(X(f))

A minor technical point is that the left hand side is at p, while the right hand side is at p+tX. However, implicitly we have taken a limit t\to 0. Hence, as the vector fields are continuous, we get

(X(Y(f))-Y(X(f)))(p+tX)\to (X(Y(f))-Y(X(f)))(p)

This expression can be generalized really simply to tensors. Let us find out what the Lie derivative of a (0,n) tensor is:


The terms can be simplified in a similar way as above: T(p)(Y_1,\dots,Y_n) is a function. Hence, it is equal to

T(p+tX)(Y_1,\dots,Y_n)-tD_X(T(p+tX)(Y_1,\dots,Y_n)). On the other hand,

(\phi)^{-t}_*(T(p+tX)((Y_1,Y_2,\dots,Y_n))=T(Y_1\circ\phi^{-t}_*,\dots,Y_n\circ \phi^{-t})=T((Y_1-tL_XY_1)(p+tX),\dots,(Y_n-tL_XY_n)(p+tX))

This is easily simplified to give


Coming to grips with Special Relativity

Contrary to popular opinion, Special Relativity is not a more specialized, more involved part of General Relativity. It is the easier of the two Relativity theories, involving only thought experiments and Linear Algebra. However, despite having been exposed to ideas from this theory right from school, and also taking an advanced course (and doing well) in it, I have always felt that I don’t really¬†understand this theory. And many people I know in grad school, those who have taken this and more advanced courses, feel the same way.

Reason for not really knowing what’s going on: Time dilation is explained by light clocks. But that’s just one kind of clock!! What if we had a different kind of clock? Would it still show that time is slowing down in a moving frame? These and other misunderstood¬†thought experiments give one the impression that only our perception of time and length are changing. Time and length aren’t really changing. And this is despite accepting easily the two postulates of Special Relativity: that the speed of light is same in all frames, and that the laws of Physics are valid in all inertial frames.

The motivation of this article is that we need better thought experiments to understand Special Relativity. And the author, recently fueled by the brilliantly written autobiography of Einstein by Walter Isaacson, hopes to do just the same.

Length contraction

Maxwell’s laws specify the speed of light, and their formulation suggests that it should be the same in all inertial frames. Now imagine that you’re traveling in a train, and you have a 1 ft wide window. A window that is 1 ft while stationary, will appear to be 1 ft long when it is moving, if you’re moving with it. Hence, lengths don’t change while you’re in the same frame. Now if you’re observing from the platform, the light will travel across the window in time t. Similarly, if you’re inside the train, light will travel across the window in time t. All good. So what has changed? If I stand on the platform and observe the moving train, I can see that the relative velocity of the train and the light beam is low. Hence, if the window becomes shorter in the direction of motion of the light beam, all will be well. Hence, the window remains 1 ft in the frame of the moving train. But it shortens in the frame of reference of the platform.

Does this mean that there is no absolute length? There is! It is the length measured in the frame of reference of the window.

Time dilation

Now we’ll have to move perpendicular to the motion of the train. We all know the famous time clock, in which a light beam bounces off mirrors that are placed parallel to the motion of the train.


Here’s what you need to remember: when the light beam hits the mirror, you see it, the person sitting inside the train sees it, everyone sees it at the same moment. Alright. Here we go.

The light travels a longer distance, if you’re observing from the platform. Hence, as the speed of light is the same in both frames, the person standing on the platform should see the light beam reflecting from the top mirror and arriving at the bottom mirror in t seconds, while the person sitting inside the train should see the light beam arriving at the bottom mirror in, say, t' seconds. Clearly, t>t'. However, they both see the light arrive at the bottom mirror at the same moment. There can be no discrepancy about this. It is almost like t' expanded, although remaining of the same magnitude, and became equal to t. This is what is called time dilation. For the person standing on the platform, if they were to look inside the train, they would imagine the world moving at a slower rate. Just imagine a slow motion movie running inside the train.

But has time really expanded? Have lengths really contracted? No! For the observer sitting in the train, the same length contractions and time time dilations will happen for phenomena on the platform. Basically there are two kinds of length- the length observed from the frame of the object, and the length observed from a moving frame. And the length observed from the moving frame is always shorter. The same can be said about time dilation.

Putnam A1, 2017

Putnam 2017, A1) Let S be the smallest set of positive integers that such

a) 2\in S

b) If n^2\in S, then n\in S

c) If n\in S, then (n+5)^2\in S

Which positive integers are not in S?

Although A1 is generally supposed to be one of the easiest problems on the Putnam, I have not been able to solve this problem in the past. Part of the difficulty of the problem arises from the fact that we are not given the answer, and then asked to prove it. Hence, it is easy to miss out on cases, and I for one found it pretty difficult to determine all the cases when I hadn’t looked at the answer (not the proof).

Proof: We prove that all numbers except 1 and multiples of 5 belong to S. We know from conditions b and c that n\in S\implies (n+5)\in S. Hence, if we can prove that 2,3,4 and 6 belong to S, then we will have proved the assertion.

Proving that 4 belongs to S is perhaps the only non-trivial step in this problem. Note that we have to find a number of the form 4^{2n} to accomplish this, and numbers of the form 4^{2n} are 1\pmod 5. Hence, as we don’t yet know if there exists a number in S that is 1\pmod 5, we do know that there exists a number that is 4\pmod 5, which is (2+5)^2=49. On squaring this number, we get 49^2, which is obviously 1\pmod 5. Now we’re in the game. We just need to find some 4^{2n} which is larger than 49^2, and then add enough 5‘s until we attain it. Then we take n square roots to get 4.

Similarly, 6^{2n} is 1\pmod 5. We find some 6^{2n} that is larger than 49^2, and then take n square roots to get 6\in S.

Now if 4\in S, then so does 4+5=9, and hence \sqrt{9}=3. Having proved that 2,3,4,6\in S, we’re done.

(Part of) a proof of Sard’s Theorem

I have always wanted to prove Sard’s Theorem. Now I shall stumble my way into proving a deeply unsatisfying special case of it, after a whole day of dead ends and red herrings.

Consider first the special case of a smooth function f:\Bbb{R}\to\Bbb{R}. At first, I thought that the number of critical points of such a function have to be countable. Hence, the number of critical values should also be countable, which would make the measure of critical values 0. However, our resident pathological example of the Cantor set makes things difficult. Turns out that not only can the critical *points* be uncountable, but also of non-zero measure (of course the canonical example of such a smooth function involves a modified Cantor’s set of non-zero measure). In fact, even the much humbler constant function sees its set of critical points having a positive measure of course. However, the set of critical *values* may still have measure 0, and it indeed does.

For f:\Bbb{R}\to\Bbb{R}, consider the restriction of f to [a,b]\subset \Bbb{R}. Note that the measure of critical points of f in [a,b] has to be finite (possibly 0). Note that f'(x) is bounded in [a,b]. Hence, at each critical *point* p in [a,b], given \epsilon>0, there exists a \delta(\epsilon)>0 such that if m(N(p))<\delta(\epsilon), then m(f(N(p)))<\epsilon. This is just another way of saying that we can control the measure of the image.

Note that the reason why I am writing \delta(\epsilon) is that I want to emphasize the behaviour of \frac{\epsilon}{\delta(\epsilon)}. As p is a critical point, at this point \lim\limits_{\epsilon\to 0}\frac{\epsilon}{\delta(\epsilon)}=0. This comes from the very definition of the derivative of a function being 0.

Divide the interval [a,b] into cubes of length <\delta(\epsilon). Retain only those cubes which contain at least one critical point, and discard the rest. Let the final remaining subset of [a,b] be A. Then the measure of f(A)\leq \text{number of cubes}\times\epsilon. The number of cubes is \frac{m(A)}{\delta(\epsilon)}. Hence, m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}. Note that f(A) contains all the critical values.

As \epsilon\to 0, we can repeat this whole process verbatim. Everything pretty much remains the same, except for the fact that \frac{\epsilon}{\delta(\epsilon)}\to 0. Hence, m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}\to 0. This proves that the set of critical values has measure 0, when f is restricted to [a,b].

Now when we consider f over the whole of \Bbb{R}, we can just subdivide it into \cup [n,n+1], note that the set of critical values for all these intervals has measure 0, and hence conclude that the set of critical values for f over the whole of \Bbb{R} also has measure 0.

Note that for this can be generalized for any f:\Bbb{R}^n\to \Bbb{R}^n.

Also, the case for f:\Bbb{R}^m\to \Bbb{R}^n where m<n is trivial, as the image of \Bbb{R}^m itself should have measure 0.

Furstenberg’s topological proof of the infinitude of primes

Furstenberg and Margulis won the Abel Prize today. In honor of this, I spent the better part of the evening trying to prove Furstenberg’s topological proof of the infinitude of primes. I was going down the wrong road at first, but then, after ample hints from Wikipedia and elsewhere, I was able to come up with Furstenberg’s original argument.

Furstenberg’s argument: Consider \Bbb{N}, and a topology on it in which the open sets are generated by \{a+b\Bbb{N}\}, where a,b\in\Bbb{N}. It is easy to see that such sets are also closed. Open sets, being the union of infinite generators, have to be infinite. However, if there are a finite number of primes p_1,\dots,p_n, then the open set \Bbb{N}\setminus (\cup_i \{p_i\Bbb{N}\})=\{1\} is finite, which is a contradiction.

My original flawed proof: Let \{a+b\Bbb{N}\} be connected sets in this topology. Then, as one can see clearly, \Bbb{N}=\{2\Bbb{N}\}\cup\{1+2\Bbb{N}\}; in other words, it is the union of two open disjoint sets. Therefore, it is not connected. If the number of primes is finite, then \cap \{p_i\Bbb{N}\}=\{p_1p_2\dots p_n\Bbb{N}\}, which is itself an open connected set. Hence, as all \{p_i\Bbb{N}\} have a non-empty intersection which is open and connected, the union of all such open sets \cup \{p_i\Bbb{N}\} must lie in a single component. This contradicts the fact that \cup\{p_i\Bbb{N}\}=\Bbb{N}.

This seemed too good to be true. Upon thinking further, we realize the fact that our original assumption was wrong. \{a+b\Bbb{N}\} can never be a connected set, as it is itself made up of an infinite number of open sets. In fact, it can be written as a union of disjoint open sets in an infinite number of ways. This topology on \Bbb{N} is bizarrely disconnected.