Fruits of procrastination

Month: March, 2020

Lie derivatives: a simple idea behind a messy calculation

I want to write about Lie derivatives. Because finding good proofs for Lie derivatives in books and on the internet is a lost cause. Because they have caused me a world of pain. Because we could all do with less pain.

In all that is written below, we assume that all Lie derivatives are being found in the direction of the vector field X, and that \phi_t is the flow along this vector field.

What is a Lie Derivative? A Lie derivative is a derivative, but for things more complicated than functions. Basically, for a given vector field X, T(p+tX)=f(p)+tL_XT (roughly speaking). Here T is a tensor, and a function is a special case of a tensor:

f(p+tX)=f(p)+tL_Xf=f(p)+tD_Xf, where D_Xf=L_Xf.

How do we then find a workable definition for a Lie derivative? Let us calculate the Lie derivative of a vector field.

Let me first try and explain what I will try and do. I know how functions and their derivatives behave. f(p+tX)\approx f(p)+ t(derivative of f in the X direction). We will use this simple derivative rule of f, wherever we can, to find out the what the Lie derivative of a vector field is.

For a vector field

Y, we have Y(p+tX)=Y(p)+t(L_XY)(p).

This, however does not make sense, as Y(p)+t(L_XY)(p) is a vector field at p, while Y(p+tX) is a vector field at p+tX. Hence, the correct definition should be


This is equivalent to the following definition:


This, in turn, is equivalent to the formulation


We shall now try to simplify these terms.


to first order in t. Similarly,

(\phi)^{-t}_*(Y(p+tX))(f)=Y(p+tX)(f\circ \phi^{-t}_*)=Y(p+tX)(f(p+tX)-tX(f))

Adding these two terms together, we get (L_XY(p))(f)=X(Y(f))-Y(X(f))

A minor technical point is that the left hand side is at p, while the right hand side is at p+tX. However, implicitly we have taken a limit t\to 0. Hence, as the vector fields are continuous, we get

(X(Y(f))-Y(X(f)))(p+tX)\to (X(Y(f))-Y(X(f)))(p)

This expression can be generalized really simply to tensors. Let us find out what the Lie derivative of a (0,n) tensor is:


The terms can be simplified in a similar way as above: T(p)(Y_1,\dots,Y_n) is a function. Hence, it is equal to

T(p+tX)(Y_1,\dots,Y_n)-tD_X(T(p+tX)(Y_1,\dots,Y_n)). On the other hand,

(\phi)^{-t}_*(T(p+tX)((Y_1,Y_2,\dots,Y_n))=T(Y_1\circ\phi^{-t}_*,\dots,Y_n\circ \phi^{-t})=T((Y_1-tL_XY_1)(p+tX),\dots,(Y_n-tL_XY_n)(p+tX))

This is easily simplified to give


Coming to grips with Special Relativity

Contrary to popular opinion, Special Relativity is not a more specialized, more involved part of General Relativity. It is the easier of the two Relativity theories, involving only thought experiments and Linear Algebra. However, despite having been exposed to ideas from this theory right from school, and also taking an advanced course (and doing well) in it, I have always felt that I don’t really¬†understand this theory. And many people I know in grad school, those who have taken this and more advanced courses, feel the same way.

Reason for not really knowing what’s going on: Time dilation is explained by light clocks. But that’s just one kind of clock!! What if we had a different kind of clock? Would it still show that time is slowing down in a moving frame? These and other misunderstood¬†thought experiments give one the impression that only our perception of time and length are changing. Time and length aren’t really changing. And this is despite accepting easily the two postulates of Special Relativity: that the speed of light is same in all frames, and that the laws of Physics are valid in all inertial frames.

The motivation of this article is that we need better thought experiments to understand Special Relativity. And the author, recently fueled by the brilliantly written autobiography of Einstein by Walter Isaacson, hopes to do just the same.

Length contraction

Maxwell’s laws specify the speed of light, and their formulation suggests that it should be the same in all inertial frames. Now imagine that you’re traveling in a train, and you have a 1 ft wide window. A window that is 1 ft while stationary, will appear to be 1 ft long when it is moving, if you’re moving with it. Hence, lengths don’t change while you’re in the same frame. Now if you’re observing from the platform, the light will travel across the window in time t. Similarly, if you’re inside the train, light will travel across the window in time t. All good. So what has changed? If I stand on the platform and observe the moving train, I can see that the relative velocity of the train and the light beam is low. Hence, if the window becomes shorter in the direction of motion of the light beam, all will be well. Hence, the window remains 1 ft in the frame of the moving train. But it shortens in the frame of reference of the platform.

Does this mean that there is no absolute length? There is! It is the length measured in the frame of reference of the window.

Time dilation

Now we’ll have to move perpendicular to the motion of the train. We all know the famous time clock, in which a light beam bounces off mirrors that are placed parallel to the motion of the train.


Here’s what you need to remember: when the light beam hits the mirror, you see it, the person sitting inside the train sees it, everyone sees it at the same moment. Alright. Here we go.

The light travels a longer distance, if you’re observing from the platform. Hence, as the speed of light is the same in both frames, the person standing on the platform should see the light beam reflecting from the top mirror and arriving at the bottom mirror in t seconds, while the person sitting inside the train should see the light beam arriving at the bottom mirror in, say, t' seconds. Clearly, t>t'. However, they both see the light arrive at the bottom mirror at the same moment. There can be no discrepancy about this. It is almost like t' expanded, although remaining of the same magnitude, and became equal to t. This is what is called time dilation. For the person standing on the platform, if they were to look inside the train, they would imagine the world moving at a slower rate. Just imagine a slow motion movie running inside the train.

But has time really expanded? Have lengths really contracted? No! For the observer sitting in the train, the same length contractions and time time dilations will happen for phenomena on the platform. Basically there are two kinds of length- the length observed from the frame of the object, and the length observed from a moving frame. And the length observed from the moving frame is always shorter. The same can be said about time dilation.

Putnam A1, 2017

Putnam 2017, A1) Let S be the smallest set of positive integers that such

a) 2\in S

b) If n^2\in S, then n\in S

c) If n\in S, then (n+5)^2\in S

Which positive integers are not in S?

Although A1 is generally supposed to be one of the easiest problems on the Putnam, I have not been able to solve this problem in the past. Part of the difficulty of the problem arises from the fact that we are not given the answer, and then asked to prove it. Hence, it is easy to miss out on cases, and I for one found it pretty difficult to determine all the cases when I hadn’t looked at the answer (not the proof).

Proof: We prove that all numbers except 1 and multiples of 5 belong to S. We know from conditions b and c that n\in S\implies (n+5)\in S. Hence, if we can prove that 2,3,4 and 6 belong to S, then we will have proved the assertion.

Proving that 4 belongs to S is perhaps the only non-trivial step in this problem. Note that we have to find a number of the form 4^{2n} to accomplish this, and numbers of the form 4^{2n} are 1\pmod 5. Hence, as we don’t yet know if there exists a number in S that is 1\pmod 5, we do know that there exists a number that is 4\pmod 5, which is (2+5)^2=49. On squaring this number, we get 49^2, which is obviously 1\pmod 5. Now we’re in the game. We just need to find some 4^{2n} which is larger than 49^2, and then add enough 5‘s until we attain it. Then we take n square roots to get 4.

Similarly, 6^{2n} is 1\pmod 5. We find some 6^{2n} that is larger than 49^2, and then take n square roots to get 6\in S.

Now if 4\in S, then so does 4+5=9, and hence \sqrt{9}=3. Having proved that 2,3,4,6\in S, we’re done.

(Part of) a proof of Sard’s Theorem

I have always wanted to prove Sard’s Theorem. Now I shall stumble my way into proving a deeply unsatisfying special case of it, after a whole day of dead ends and red herrings.

Consider first the special case of a smooth function f:\Bbb{R}\to\Bbb{R}. At first, I thought that the number of critical points of such a function have to be countable. Hence, the number of critical values should also be countable, which would make the measure of critical values 0. However, our resident pathological example of the Cantor set makes things difficult. Turns out that not only can the critical *points* be uncountable, but also of non-zero measure (of course the canonical example of such a smooth function involves a modified Cantor’s set of non-zero measure). In fact, even the much humbler constant function sees its set of critical points having a positive measure of course. However, the set of critical *values* may still have measure 0, and it indeed does.

For f:\Bbb{R}\to\Bbb{R}, consider the restriction of f to [a,b]\subset \Bbb{R}. Note that the measure of critical points of f in [a,b] has to be finite (possibly 0). Note that f'(x) is bounded in [a,b]. Hence, at each critical *point* p in [a,b], given \epsilon>0, there exists a \delta(\epsilon)>0 such that if m(N(p))<\delta(\epsilon), then m(f(N(p)))<\epsilon. This is just another way of saying that we can control the measure of the image.

Note that the reason why I am writing \delta(\epsilon) is that I want to emphasize the behaviour of \frac{\epsilon}{\delta(\epsilon)}. As p is a critical point, at this point \lim\limits_{\epsilon\to 0}\frac{\epsilon}{\delta(\epsilon)}=0. This comes from the very definition of the derivative of a function being 0.

Divide the interval [a,b] into cubes of length <\delta(\epsilon). Retain only those cubes which contain at least one critical point, and discard the rest. Let the final remaining subset of [a,b] be A. Then the measure of f(A)\leq \text{number of cubes}\times\epsilon. The number of cubes is \frac{m(A)}{\delta(\epsilon)}. Hence, m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}. Note that f(A) contains all the critical values.

As \epsilon\to 0, we can repeat this whole process verbatim. Everything pretty much remains the same, except for the fact that \frac{\epsilon}{\delta(\epsilon)}\to 0. Hence, m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}\to 0. This proves that the set of critical values has measure 0, when f is restricted to [a,b].

Now when we consider f over the whole of \Bbb{R}, we can just subdivide it into \cup [n,n+1], note that the set of critical values for all these intervals has measure 0, and hence conclude that the set of critical values for f over the whole of \Bbb{R} also has measure 0.

Note that for this can be generalized for any f:\Bbb{R}^n\to \Bbb{R}^n.

Also, the case for f:\Bbb{R}^m\to \Bbb{R}^n where m<n is trivial, as the image of \Bbb{R}^m itself should have measure 0.

Furstenberg’s topological proof of the infinitude of primes

Furstenberg and Margulis won the Abel Prize today. In honor of this, I spent the better part of the evening trying to prove Furstenberg’s topological proof of the infinitude of primes. I was going down the wrong road at first, but then, after ample hints from Wikipedia and elsewhere, I was able to come up with Furstenberg’s original argument.

Furstenberg’s argument: Consider \Bbb{N}, and a topology on it in which the open sets are generated by \{a+b\Bbb{N}\}, where a,b\in\Bbb{N}. It is easy to see that such sets are also closed. Open sets, being the union of infinite generators, have to be infinite. However, if there are a finite number of primes p_1,\dots,p_n, then the open set \Bbb{N}\setminus (\cup_i \{p_i\Bbb{N}\})=\{1\} is finite, which is a contradiction.

My original flawed proof: Let \{a+b\Bbb{N}\} be connected sets in this topology. Then, as one can see clearly, \Bbb{N}=\{2\Bbb{N}\}\cup\{1+2\Bbb{N}\}; in other words, it is the union of two open disjoint sets. Therefore, it is not connected. If the number of primes is finite, then \cap \{p_i\Bbb{N}\}=\{p_1p_2\dots p_n\Bbb{N}\}, which is itself an open connected set. Hence, as all \{p_i\Bbb{N}\} have a non-empty intersection which is open and connected, the union of all such open sets \cup \{p_i\Bbb{N}\} must lie in a single component. This contradicts the fact that \cup\{p_i\Bbb{N}\}=\Bbb{N}.

This seemed too good to be true. Upon thinking further, we realize the fact that our original assumption was wrong. \{a+b\Bbb{N}\} can never be a connected set, as it is itself made up of an infinite number of open sets. In fact, it can be written as a union of disjoint open sets in an infinite number of ways. This topology on \Bbb{N} is bizarrely disconnected.

Proving inequalities using convex functions

I have found that I am pretty bad at finding “clever factors” for Cauchy-Schwarz, whose bounds can be known from the given conditions. However, I am slowly getting comfortable with the idea of converting the expression into a convex function, and then using the Majorization Theorem.

(Turkey) Let n\geq 2, and x_1,x_2,\dots,x_n positive reals such that x_1^2+x_2^2+\dots+x_n^2=1. Find the minimum value of \sum\limits_{i=1}^n \frac{x_i^5}{\sum\limits_{j\neq i} x_j}

My proof: We have (x_1^2+1)+\dots+(x_n^2+1)=n+1. Hence, using AM-GM inequality, we have x_1+\dots+x_n\leq \frac{n+1}{2}.

The expression we finally get is

\sum\limits_{i=1}^n \frac{x_i^5}{\sum\limits_{j\neq i} x_j}\geq \sum\limits_{i=1}^n \frac{x_i^5}{\frac{n+1}{2}-x_i}

Consider the function f(x)=\frac{x^{5/2}}{\frac{n+1}{2}-\sqrt{x}}. By differentiating twice, we know that for x\leq 1, this function is convex.

Hence, f(x_1^2)+\dots+f(x_n^2) will be minimized only when x_1^2=\dots=x_n^2, which we know from the conditions given in the question, is \frac{1}{n}.

Note that

f(x_1^2)+\dots+f(x_n^2)= \sum\limits_{i=1}^n \frac{x_i^5}{\frac{n+1}{2}-x_i}

Hence, the minimum is attained when x_1=\dots=x_n=\frac{1}{\sqrt{n}}, and is equal to \frac{1}{n(n-1)}, which is found by substituting x_1=\frac{1}{\sqrt{n}} in the original expression.

A beautiful generalization of the Nesbitt Inequality

I want to discuss a beautiful inequality, that is a generalization of the famous Nesbitt inequality:

(Romanian TST) For positive a,b,x,y,, prove that \frac{x}{ay+bz}+\frac{y}{az+bx}+\frac{z}{ax+by}\geq \frac{3}{a+b}

Clearly, if a=b, then we get Nesbitt’s inequality, which states that

\frac{x}{y+z}+\frac{y}{z+x}+\frac{z}{x+y}\geq \frac{3}{2}.

This is question 14 on Mildorf’s “Olympiad Inequalities”, and its solution comprises finding a factor to multiply this expression with, almost out of thin air, and then use Cauchy Schwarz and AM-GM inequalities to prove the assertion. My solution is the following:

On interchanging a and b, the right hand side remains the same. However, the left hand side becomes

\frac{x}{az+by}+\frac{y}{ax+bz}+\frac{z}{ay+bx}\geq \frac{3}{a+b}

On adding these two inequalities, we get

\frac{x}{ay+bz}+\frac{x}{az+by}+\frac{y}{az+bx}+\frac{y}{ax+bz}+\frac{z}{ax+by}+\frac{z}{ay+bx}\geq \frac{6}{a+b}

Multiplying both sides by \frac{1}{2}(a+b) and then adding 6 on both sides, we get

2(x+y+z)(\frac{1}{x+y}+\frac{1}{y+z}+\frac{1}{z+x})\geq 9

This is obviously true by Cauchy Schwarz. We will explain below how we got this expression.

Let us see what happens to \frac{x}{ay+bz}+\frac{x}{az+by} in some detail. After multiplying by \frac{1}{2}(a+b) and adding 2, we get

\frac{\frac{1}{2}(a+b)x+ay+bz}{ay+bz}+\frac{\frac{1}{2}(a+b)x+az+by}{az+by}\geq 2\frac{(a+b)(x+y+z)}{(a+b)(y+z)}=2\frac{(x+y+z)}{y+z}.

EDIT: I assumed that this was obviously true. However, it is slightly non-trivial that this is true. For \frac{a}{b}+\frac{c}{d}\geq 2\frac{a+c}{b+d}, the condition that should be true is that (b-d)(bc-ad)\geq 0. This is true in our case above.

After adding the other terms also, we get


As pointed above, this is clearly \geq 9 by Cauchy-Schwarz.

Hence proved

Note: For the sticklers saying this isn’t a rigorous proof, a rigorous proof would entail us assuming that

\frac{x}{ay+bz}+\frac{y}{az+bx}+\frac{z}{ax+by}< \frac{3}{a+b}, and then deriving a contradiction by proving that

2(x+y+z)(\frac{1}{x+y}+\frac{1}{y+z}+\frac{1}{z+x})< 9, which is obviously false

A small note on re-defining variables to prove inequalities

I just want to record my solution to the following problem, as it is different from the one given online.

For a,b,c,d positive real numbers, prove that \frac{1}{a}+\frac{1}{b}+\frac{4}{c}+\frac{16}{d}\geq \frac{64}{a+b+c+d}

This has a fairly straight forward solution using Cauchy-Schwarz inequality, which for some reason I did not think of.

The way that I solved it is that I re-defined the variables: let a=8a', b=8b', c=16c' and d=32 d'. Then this is equivalent to proving that \frac{1}{8}\frac{1}{a'}+\frac{1}{8}\frac{1}{b'}+\frac{1}{4}\frac{1}{c'}+\frac{1}{2}\frac{1}{d'}\geq \frac{1}{\frac{a'}{8}+\frac{b'}{8}+\frac{c'}{4}+\frac{d'}{2}}

This is easily seen to be a consequence of Jensen’s inequality, as \frac{1}{x} is a convex function for positive x.

A proof of Muirhead’s Inequality

I’ve been reading Thomas Mildorf’s Olympiad Inequalities, and trying to prove the 12 Theorems stated at the beginning. I’m recording my proof of Muirhead’s Inequality below. Although it is probably known to people working in this area, I could not find it on the internet.

Muirhead’s Inequality states the following: if the sequence a_1,\dots,a_n majorizes the sequence b_1,\dots,b_n, then for nonnegative numbers x_1,\dots,x_n, we have the following inequality

\sum\limits_{\text{sym}} x_1^{a_1}x_2^{a_2}\dots x_n^{a_n}\geq \sum\limits_{\text{sym}}x_1^{b_1}x_2^{b_2}\dots x_n^{b_n}

We will derive it as an easy consequence of the fact that for a convex function f, if the sequence \{a_i\} majorizes the sequence \{b_j\}, then f(a_1)+\dots f(a_n)\geq f(b_1)+\dots+f(b_n). This is theorem 9 in Mildorf’s document (the Majorization Theorem).

We will prove the Muirhead Inequality by induction on the number of x_i‘s. For just one such x_i, this is true by the Majorization Inequality. Now assume that it is true for (n-1) number of x_i‘s. Then the statement of Muirhead’s Inequality is equivalent to the statement

(x_1^{a_1}+x_1^{a_2}+\dots x_1^{a_n})(x_2^{a_1}+x_2^{a_2}+\dots x_2^{a_n})\dots (x_n^{a_1}+x_n^{a_2}+\dots x_n^{a_n})\geq (x_1^{b_1}+x_1^{b_2}+\dots x_1^{b_n})(x_2^{b_1}+x_2^{b_2}+\dots x_2^{b_n})\dots (x_n^{b_1}+x_n^{b_2}+\dots x_n^{b_n})

The above is true because f_i(y)=x_i^y is a convex function, as it is an exponential function. Hence, f_i(a_1)+\dots f_i(a_n)\geq f_i(a_1)+\dots+f_i(a_n). Therefore, x_i^{a_1}+\dots x_i^{a_n}\geq x_i^{b_1}+\dots x_i^{b_n}.

It follows that

(x_1^{a_1}+x_1^{a_2}+\dots x_1^{a_n})(x_2^{a_1}+x_2^{a_2}+\dots x_2^{a_n})\dots (x_n^{a_1}+x_n^{a_2}+\dots x_n^{a_n})\geq (x_1^{b_1}+x_1^{b_2}+\dots x_1^{b_n})(x_2^{b_1}+x_2^{b_2}+\dots x_2^{b_n})\dots (x_n^{b_1}+x_n^{b_2}+\dots x_n^{b_n})

A note on the induction: Consider the base case, where we just have x_1. Then the inequality is equivalent to proving that

x_1^{a_1}+x_1^{a_2}+\dots+x_1^{a_n}\geq x_1^{b_1}+x_1^{b_2}+\dots+x_1^{b_n}, which is true by Muirhead inequality.

Now if we have two x_i‘s, say x_1 and x_2, then this is equivalent to the inequality (x_1^{a_1}+\dots+x_1^{a_n})(x_2^{a_1}+\dots+x_2^{a_n})\geq (x_1^{b_1}+\dots+x_1^{b_n})(x_2^{b_1}+\dots+x_2^{b_n}).

We get the above inequality + terms of the form (xy)^{a_1}+\dots+(xy)^{a_n}, which we know from the Majorization Inequality is \geq (xy)^{b_1}+\dots+(xy)^{b_n}

The induction will work similarly for higher numbers of x_i‘s.

A simpler way to obtain smooth functions than convolutions?

Of the many mathematical concepts that I don’t understand, one of the more important ones is the convolution of functions. It is defined in the following way:

(f*g)(x)=\int_{-\infty}^{\infty} f(x-y)g(y) dy

Our guiding principle should be that we want to make ``*" an abelian group action (although inverses are not always present, at least when talking about integrable functions).

However, perhaps the reason why we thought of this action in the first place was that we wanted smooth functions out of just integrable functions. For instance, given any integrable function f, if \phi(x) is a smooth compactly supported function, (\phi*f)(x)=\int_{-\infty}^{\infty}\phi(x-y)f(y)dy will be smooth (provided we can bring the derivatives under the integral sign, which is related to the Dominated Convergence Theorem).

However, why have this complicated definition? Why not just consider the function g(x)=\int_{-\infty}^{\infty}\phi(x)f(y)dy? Clearly, if \phi(x) is smooth, so is this one.

One of the reasons why we perhaps want the more complicated definition of (f*g)(x)=\int_{-\infty}^{\infty} f(x-y)g(y) dy is that there does not exist a function \phi(x) such that \int_{-\infty}^{\infty} \phi(x)f(y)dy=f(x) for all functions f(x). Hence, there cannot exist an identity element for the set of integrable functions on the real line. Also, this definition is clearly not commutative. I’d be interested in knowing your thoughts about what other purposes convolution serves, that this simple definition does not.