Capacitors and making complicated shapes simpler

The paper that I intend to blog about is: Parallel-Plate Capacitor by the Schwartz-Christoffel Transformation by Harlan B. Palmer.

I intend to read some research papers in various academic fields, and, perhaps amateurishly, blog about them. We could all be better about what is happening outside of our narrow fields of interest. Although I hope to be regular about this, it is likely that this is going to be a sporadic bunch of posts until my employer asks that I devote more time to my day job.

This is the paper by Harlan B. Palmer that I am referring to.

How do we measure the electrical “energy” (electrostatic potential energy) or electrical field strength of a capacitor? If we can somehow measure the flux density between the plates of the capacitor, we will have the answer. However, in such a calculation, we only consider the sides of the plates facing each other. We don’t consider the back parts of these plates. These also have charge, and hence affect the electrical field strength. The author contends that the true value of the capacitance, which depends on the electrostatic potential, might be “several thousand percent greater than indicated”.

Another complicating factor is “fringing”. Near the ends of the plates, the flux lines veer towards the center of the opposite plates, and don’t go straight up. This makes the flux lines more difficult to deal with mathematically.

Consider the figure given below:

We take a polygon containing the front and back sides of both plates. So ideally, if we can integrate the flux attached to all sides of the plates, we should be able to calculate capacitance. However, the flux lines are all twisty and non-uniform, the sides of the polygon seem complicated and difficult to deal with, and the whole enterprise looks very challenging.

What if we could “straighten” this messy polygon into a rectangle, such that all the flux lines pass through the sides of this rectangle uniformly? That would make life simple! This miraculous transformation is achieved by the Schwartz-Christoffel transformation.

The Schwartz-Christoffel Transformation takes the figure given below, and maps it to any polygon that is desired:

Moreover, this transformation is reversible. Hence, effectively we can map any polygon on the complex plane to any other polygon on the complex plane, and that too conformally. What this means is that we can preserve the Physics principles. The lines of electric flux should always be perpendicular to equipotential lines, and because this is the case in the complicated polygon, this will remain the case even in the simpler rectangle.

The Schwartz-Christoffel transformation looks like this:

Z=\int (\zeta-\zeta_a)^{m_a}(\zeta-\zeta_b)^{m_b}\dots (\zeta-\zeta_n)^{m_n}d\zeta

Here, the m_i‘s are telling Z what angle to turn by, in order to form the desired polygon. Also, we can determine the lengths of the sides of the polygons such that the flux lines attached to the plates indeed have uniform density.

What I think has happened is this. Look at the rectangle below:

If you take the top plate of the capacitor, peel out the front and back sides of this plate and lay them side by side, you’ll get the top side of this rectangle. Similarly, if you take the front and back sides of the lower plate of a capacitor and law them out side by side, you’ll get the bottom plate of this rectangle. Moreover, there is no “fringing” as you may imagine that the thin polygon in Figure 1b has been straightened out. Hence, the flux lines are now uniform. This will now lead to easy calculations without error.


  1. Parallel-Plate Capacitor by the Schwartz-Christoffel Transformation, by Harlan B. Palmer

Some problems from Indian National Mathematics Olympiad, 2014

I just want to add a couple of problems from INMO 2014 that I solved this morning. The first problem was slightly less tricky, and just involved pairing divisors with each other in the most obvious way. However, the second problem is quite devious, and is more of an existence proof than writing down an explicit winning strategy.

  • Let n be a natural number. Prove that \left\lfloor \frac{n}{1} \right\rfloor+\left\lfloor \frac{n}{2} \right\rfloor+\dots+\left\lfloor \frac{n}{n} \right\rfloor+\left\lfloor \sqrt{n} \right\rfloor is even.

We will prove this by induction. Let f(n)=\left\lfloor \frac{n}{1} \right\rfloor+\left\lfloor \frac{n}{2} \right\rfloor+\dots+\left\lfloor \frac{n}{n} \right\rfloor+\left\lfloor \sqrt{n} \right\rfloor. Note that f(1)=2 is even. Also, note that f(i)=f(i+1)+ the number of divisors of i+1. It is precisely at those divisors that 1 is added to f(i+1). If i+1 is not a square, then the number of divisors can be paired up as (d,\frac{i+1}{d}). Hence, the number of divisors should be even. Moreover, \lfloor \sqrt{i}\rfloor=\lfloor \sqrt{i+1}\rfloor. However, if i+1 is indeed a square, then \sqrt{i+1} is a divisor that cannot be paired up with a different divisor as \frac{i+1}{\sqrt{i+1}}=\sqrt{i+1} itself. However, \lfloor \sqrt{i+1}\rfloor=\lfloor \sqrt{i}\rfloor+1. Hence, we get two extra 1’s, which still keeps f(i+1) even. Hence, f(i) and f(i+1) have the same parity. Now as f(1)=2 is even, f(i) is even for all i\geq 1. Hence proved. QED

  • Written on a blackboard is the polynomial x^2+x+2014. Calvin and Hobbes take turns alternately (starting with Calvin) in the following game. During his turn, Calvin should either increase or decrease the coefficient of x by 1. And during his turn, Hobbes should either increase or decrease the constant coefficient by 1. Calvin wins if at any point of time the polynomial on the blackboard has integer roots. Prove that Calvin has a winning strategy.

Calvin’s first move should be to increase the coefficient of x by 1. We will prove that Calvin has a winning strategy by induction.

Observe that for x^2+x+2, Calvin has the following winning strategy: After the first move, the polynomial becomes x^2+2x+2. Now if Hobbes keeps increasing the constant coefficient, Calvin should do the same. If Hobbes reduces the constant coefficient at any step, the polynomial will becomes x^2+ax+(a-1)=0, which has the integer roots 1,a-1. Hence, Calvin has a winning strategy for x^2+x+2k when k=1.

Note that for the general polynomial x^2+x+2k, if both players keep increasing the coefficients without ever decreasing them, ultimately the polynomial becomes x^2+(1+k+3)x+(2k+k+3)=0, which has the integer roots 3,k+1. Hence, in order to avoid losing, Hobbes should reduce the constant coefficient at least once.

We will use a form of strong induction here. If Calvin has a winning strategy for x^2+x+2k, then he has a winning strategy for any x^2+(1+a)x+(2k+a)=0, where a\leq k+3.

Let us assume that Calvin has a winning strategy for x^2+x+2k. From the last statement, we know that this implies that Calvin has a winning strategy for x^2+(1+a)x+(2k+a)=0, where a\leq k+3. Now let us consider x^2+x+2(k+1)=0. If Hobbes keeps increasing the constant coefficient, then we know that Calvin should do the same, and that is a winning strategy. If after a moves, where a<k+1+3, if Hobbes decreases the constant coefficient on the (a+1)th move, then the polynomial becomes x^2+(1+a+1)x+(2k+2+a-1)=x^2+(1+a+1)+(2k+a+1)=0. By strong induction, we know that Calvin has a winning strategy for this. Hence proved. QED.

Exploring Indian Mathematical Olympiads

Indian math olympiad questions are famous (infamous?) for being very analytical. There mostly do not exist any clever one line proofs. Multiple cases have to be analyzed and exhaustively eliminated before arriving upon the correct answer.

I tried solving problems from the Indian National Mathematics Olympiad, 2013 today. My solutions are different (lengthier, and hence perhaps instructive) from the official solutions given online. Hopefully they can be of help to anyone interested in mathematical problem solving.

The Indian National Mathematics Olympiad (INMO) is offered to those candidates who have cleared the Regional Mathematics Olympiad (RMO). The questions that are given are of a difficulty level comparable with P1, P4 of the International Math Olympiad.

I do plan on running a seminar on competitive math problem solving this summer. This is useful for all sorts of things, like getting better at algorithms, improving analytical skills in finance, etc. Please do let me know if you’d be interested in that sort of thing.

  • Find all positive integers m,n and primes p\geq 5, such that m(4m^2+m+12)=3(p^n-1)

First, we make a couple of observations. As p^n-1 is a multiple of 2, m will also have to be a multiple of 2. This can be re-written as 4m^3+m^2+12m+3=3p^n. On factorizing the left hand side, we get (4m+1)(m^2+3)=3p^n. Hence, as the right hand side is a multiple of 3, either 4m+1 is a multiple of 3, or m is a multiple of 3.

If 4m+1 is a multiple of 3 and m is not, then it has to be a multiple of 9, as m\equiv 2\pmod 3 is the only possibility. This is impossible as p\geq 5. Hence, 3|m. Note that from our previous observation, we also know that 2|m. Hence, combining the two, we get 6|m.

Let m=6m'. Then we have (24m'+1)(12m'^2+1)=p^n (we obtain this after canceling out 3 on both sides). For m'\geq 2, we have 24m'+1\leq 12m'^2+1. Hence, as the right hand side is p^n, we have 24m'+1|12m'^2+1. This implies that 24m'+1|12m'^2+1-(24m'+1)=12m'(m'-2). The right hand side is a multiple of 12m' while the left hand side is not. Hence, this is only possible if m'-2=0, or m'=2. As m=6m', we get m=12.

From this, we can deduce that (24m'+1)(12m'^2+1)=49^2, and hence p=7 and n=4. (12,7) is the only solution. QED

  • Let a,b,c,d be positive integers such that a\geq b\geq c\geq d. Prove that x^4-ax^3-bx^2-cx-d has no integer solutions.

Let x be a solution. Clearly x=0 cannot be a solution, as then we would have d=0, which is impossible as d is a positive integer. Hence, let us assume that x is a non-zero integer. Then we have x^4-ax^3-bx^2-cx=d. Therefore, x|d. Let d'=\frac{d}{x}. Then we have x^3-ax^2-bx-c=d'. Continuing like this, we get x=a+b'+c''+d''', where d'''=\frac{d}{x^3}, c''=\frac{c}{x^2}, and b'=\frac{b}{x^2}.

In other words, we have x=a+\frac{b+c'+d''}{a+b'+c''+d'''}. Clearly, as a\geq b\geq c\geq d, we have \frac{b+c'+d''}{a+b'+c''+d'''}\leq \frac{a+b'+c''}{a+b'+c''+d'''}<1. Hence, $x$ is not an integer. Hence proved. QED

  •  Let n be a positive integer. Call a nonempty subset S of \{1,2,\dots,n\} good if the arithmetic mean of the elements of S is also an integer. Further, let t_n denote the number of good subsets of \{1,2,\dots,n\}. Prove that t_n and n are either both odd or even.

Clearly, this is equivalent to proving that t_n-n is always even. Note that subsets consisting of one element have an arithmetic mean that is an integer. Hence, we can prove that the number of good subsets of \{1,2,\dots,n\} containing at least two elements will be even. The way that we’ll prove this assertion is by forming a pairing between sets: if \{a_1,\dots,a_k\} is one such subset, then consider the set \{n+1-a_1,\dots,n+1-a_k\}. Clearly this set will also have an integer average, and the map i\to n+1-a is idempotent. Hence, the mapping is bijective.

Case 1: Let n be even. The only cases in which bijective pairing maps a set to itself is for every i in the set, n+1-i is also included in it. As there is no i\in\{1,2,\dots,n\} such that i=n+1-i, as n+1 is odd, a set for which bijective pairing fails will have to contain an even number of elements. Hence, t_{2k+1} is even for all k\geq 1. For sets of the form t_{2k}, if i and n+1-i are both included, then the average will be \frac{k(n+1)}{2k}, which is clearly not an integer as n+1 is odd. Hence, every good set can be bijectively paired with a different good set, which implies that the total number of good sets is even.

Case 2: Let n be odd. Let us study the good sets for which the map i\to n+1-i does not produce a different set. The only different is that when i=\frac{n+1}{2}, i=n+1-i. Hence, sets of both odd and even number of elements can map to themselves through this pairing. However, if we can prove that t_{odd} and t_{even} are both odd, then their sum will be even. Combining this with the fact that the number of sets which map to other sets is anyway even, we get a total number of even sets.

Let us consider t_{2k+1}. Any set of this form will have to contain the element \frac{n+1}{2}. The total number of such sets is hence {n-1\choose 1}+{n-1\choose 2}+\dots+{n-1\choose n-1}=2^{n-1}-1, which is odd.

Let us now consider t_{2k}. Then the element \frac{n+1}{2} cannot be contained in such a set. Hence, the total number of elements is {n-1\choose 1}+\dots+{n-1\choose n-1}=2^{n-1}-1, which is odd.

Adding both cases together, we get an even number of sets that map to themselves through this mapping.

Hence, the total number of sets is even, which proves our assertion. QED

  •  Let a,b,c,x,y,z be positive real numbers such that a+b+c=x+y+z and abc=xyz. If a<b<c, x<y<z, and a\leq x<y<z\leq c, then prove that x=a,y=b and z=c.

This is one of my favorite problems. Without loss of generality, let x-a\geq c-z. Then keeping a+b+c constant, bring c\to z and a\to a+(c-z)=a'. This keeps b unchanged, and hence a'z\geq ac (as (a+x)(c-x)\geq ac), which implies a'bz\geq abc. Now keeping z constant, bring a'\to x and b\to y. This again implies that xy\geq a'b, which implies that xyz\geq a'bz\geq abc. Hence, xyz\geq abc if any of these changes happens. Hence, as abc=xyz, none of these changes should happen, and a=x,b=y and c=z. QED

An interesting Putnam problem on the Pigeonhole Principle

The following problem is contained in the book “Putnam and Beyond” by Gelca, and I saw it on stackexchange. I’m mainly recording this solution because it took me longer than usual to come up with the solution, as I was led down the wrong path many a time. Noting what is sufficient for a block of numbers to be a square is the main challenge in solving this problem.

Let there be a sequence of m terms, all of which belong to a set of n natural numbers. Prove that if 2^n\leq m, then there exists a block of consecutive terms, the product of which is a square number.

Let the n numbers be \{a_1,\dots,a_n\}, and consider the function f(k)=( a tuple of 0‘s and 1‘s), where the 0‘s and 1‘s denote the number of times \pmod 2 that each element a_i has appeared from the 1st to the kth element of the sequence of positive integers.

So f(1)=(1 somewhere, and the rest of the terms are 0), etc.

Clearly, if f(k)=(0,0,\dots,0) for any k, then the consecutive sequence of numbers from the 1st term to the kth terms is a square. If no f(k) is (0,0,0\dots,0), then there are 2^m-1 such tuples, and at least 2^m values of k. Hence, two of them must be equal. Let us suppose that f(k_1)=f(k_2). Then the sequence of terms from k_1 until k_2 is a square. Hence proved.

Proving that the first two and last two indices of the Riemann curvature tensor commute

I’ve always been confused with the combinatorial aspect of proving the properties of the Riemann curvature tensor. I want to record my proof of the fact that R(X,Y,Z,W)=R(Z,W,X,Y). This is different from the standard proof given in books. I have been unable to prove this theorem in the past, and hence am happy to write down my proof finally.

Define the function f(R(X,Y,Z,W))=R(X,Y,Z,W)-R(Z,W,X,Y). We want to prove that this function is 0.

By simple usage of the facts that R(X,Y,Z,W)+R(Y,Z,X,W)+(R(Z,X,Y,W)=0 and that switching the first two or last two vector fields gives us a negative sign, we can see that


Hence, f(R((X,Y,Z,W))=f(R(X,W,Y,Z))
Now note that R(X,Y,Z,W)=R(Y,X,W,Z). This is obtained by switching the first two and last two indices. However,


As f(R(X,Y,Z,W))= both positive and negative f(R(X,W,Y,Z)), we can conclude that it is 0.

Hence, R(X,Y,Z,W)=R(Z,W,X,Y).

It is not easy to prove this theorem because just manipulating the indices mindlessly (or even with some gameplan) can lead you down a rabbithole without ever reaching a conclusion. Meta-observations, like the above, are necessary to prove this assertion.

Thinking about a notorious Putnam problem

Consider the following Putnam question from the 2018 exam:

Consider a smooth function f:\Bbb{R}\to\Bbb{R} such that f\geq 0, and f(0)=0 and f(1)=1. Prove that there exists a point x and a positive integer n such that f^{(n)}(x)<0.

This is a problem from the 2018 Putnam, and only 10 students were able to solve it completely, making it the hardest question on the exam. I spent a day thinking about it, and my “proof” differs a lot from the official solutions, and is really a heuristic.

Proof: Assume that there does not exist any x and n such that f^{(n)}(x)<0. We will compare f with functions of the form x^n in [0,1]. We will prove that f\leq x^n on [0,1]. Because x^n\to 0 on [0,1) as n\to\infty, we will have proven that f=0 on [0,1) and f(1)=1. Hence, f cannot be smooth.

Why is f\leq x^n? Let us first analyze what f looks like. It is easy to see that f(x)=0 for x\leq 0. This is because as f\geq 0, if f(x)>0 for x<0, when f will have to decrease to 0 at x=0. Hence, there will be a negative derivative involved, which is a contradiction. Hence, f(x)=0 for x\leq 0, and by continuity of derivatives for smooth functions, all derivatives at x=0 are also 0.

Now consider the functions x^n, which are 0 at x=0 and 1 at x=1. These are the same endpoints for f(x) in [0,1]. If f(x) ever crosses x^n in [0,1), then it will have a higher nth derivative than x^n at the point of intersection. As its (n+1)th derivative is also non-negative, f will just keep shooting above x^n, and hence never “return” to x^n at x=1. This contradicts the fact that f(1)=1. Hence, f will always be bounded above by x^n in [0,1]. As this is true for all n, f=0 on [0,1) and f(1)=1. This contradicts the fact that f is continuous.

Putnam 2010

The Putnam exam is one of the hardest and most prestigious mathematical exams. Every year, more than 4,000 students, including math olympiad medalists from various countries, attempt the exam. The median score, almost every year, is 0. Each correctly answered question is worth 10 points

I often find myself trying to solve old Putnam problems, mainly because it serves as a way of “mathematical” procrastination- something that detracts me from my actual job of trying to solve my research problem. Yesterday, I ended up solving most questions from the 2010 Putnam paper. This is probably due to the fact that 2010 was one of the easiest papers in the recent past. When I compared my solutions with the solutions online, they turned out to be pretty different. Hence, I am recording them here.

The problems can be found here

A1: We need to find the largest number of boxes. Clearly, we need to place as few elements as possible in each box. If n is odd, then assume that we can place the largest element n in a box by itself. All the other boxes contain elements of the form \{i,n-i\}. Hence, we can have \frac{n+1}{2} boxes. Why can we not have an even larger number of boxes? Let a be the lowest number of elements in a box. If a=1, then the configuration above is the only possibility. If a\geq 2, then there are at most \frac{n-1}{2} boxes, which is lower than the number of boxes we have obtained above (\frac{n+1}{2}).

If n is even, then one such configuration is \{i,n+1-i\}. Using an argument similar to that given above, we can conclude that the highest number of boxes possible is \frac{n}{2}.

A2: First we prove that the derivative of the function above is periodic with period 1. For some x, we know that


Hence, f'(x+1)=\frac{f(x+2)-f(x+1)}{1}=f'(x). This way, we have proven that the derivative of f is periodic with period 1.

What this shows is it that the function repeats the same “shape” every time we move to the right or left by 1. If f is not a straight line, then there will be two points in [0,1] with different derivatives. Let us also assume without loss of generality that f(1)>f(0). Hence, f'(0)=\frac{f(1)-f(0)}{1} is positive. Let p\in[0,1] be the point with a derivative that is different from f'(0). However, f'(p)=\frac{f(p+1)-f(p)}{1} will be the same as f'(0) because f(p+1)-f(p)=f(1)-f(0). This is because the function repeats its shape with a period of 1. Hence,

f(p+1)-f(p)=\int_p^{p+1}{f'(x)} dx=\int_p^1 f'(x)dx+\int _0^p f'(x)dx=f(1)-f(0).

This contradicts the fact that f'(p)\neq f'(0).

Therefore, as the derivative at all points is the same, straight lines are the only solutions.

A3: Consider the curve \gamma: t\to (at,bt). Clearly, a\frac{dh}{dx}+b\frac{dh}{dy}=\frac{\partial}{\partial t}h(\gamma(t)). Now as h(\gamma(t))=\frac{\partial}{\partial t}h(\gamma(t)), clearly h(\gamma(t) is exponential, at least along \gamma. That contradicts the fact that h is bounded.

A4: Let n=2^ab, where b is not a power of 2. Then we claim that 10^{10^{10^n}}+10^{10^n}+10^n-1 is divisible by 10^{2^a}+1.

Proof: 10^n\equiv -1\pmod {10^{2^a}+1}.

Now 10^{10^n}=10^{10^{2^a b}}=10^{2^{2^a b}5^{2^a b}}=(10^{2^a})^{2^{2^ab-a}5^{2^ab}}. Hence, it is \equiv 1 \pmod {10^{2^a}+1}.

Similarly, 10^{10^{10^n}}\equiv 1\pmod {10^{2^a}+1}. Adding the residues up, we see that 10^{10^{10^n}}+10^{10^n}+10^n-1\equiv 0\pmod{10^{2^a}+1}

B1: Let us assume that this is true. Then we have a_1^2+a_2^2+\dots+a_n^2+\dots=2. If any of these elements, say a_i, is greater than 1, then a_i^{2n} will be much greater than 2n for large enough n. Hence, this contradicts the fact that a_1^{2n}+\dots+a_i^{2n}+\dots=2n. Therefore, all the a_i‘s are less than 1. The a_i‘s that are exactly equal to \{\pm 1\}, we can remove by subtracting them from the left hand side. Now for these numbers that are less than 1, as we increase n, \sum a_i^n is a decreasing sequence. However, the right hand side, which is n, is an increasing sequence. This is a contradiction. Therefore, no such sequence exists.

B2: The smallest such length is 3. Let A=(0,0). If the smallest length is 1, then without loss of generality let B=(1,0). Note that if C=(x,y), then both \sqrt{x^2+y^2} and \sqrt{(x-1)^2+y^2} need to be integers. This is impossible. Similarly, if the smallest length is 2, then without loss of generality, we can assume that B=(-2,0). Then if C=(x,y), then both \sqrt{x^2+y^2} and \sqrt{(x-2)^2+y^2} need to be integers. This can also be seen to be impossible, by a simple case by case analysis. Note that (n+1)^2-n^2=2n+1. Hence, the difference between \sqrt{x^2+y^2} and \sqrt{(x+2)^2+y^2} must be at least twice the smaller number plus 1. Without loss of generality, we can assume that x,y\geq 1 (y\neq 0 as the three points are not collinear). This leaves only a couple of possibilities for x,y which can be checked by hand to not be possible. Hence the minimum length is 3, which gives us the regular pythagorean triangle with sides 3,4,5. The coordinate of the points will be (0,0), (3,0) and (0,4).

B3: This is indeed a fantastic problem, with a simple solution: first transfer all the balls to B_1, and then transfer them one by one to the other baskets. As we can only draw i balls from the B_ith basket, we need to ensure that at least one basket B_i is such that it has i balls. Otherwise, no balls can be moved from the initial configuration. Hence, the minimal number of balls is 0+1+\dots+2009+1=\frac{2009\times 2010}{2}+1. Hence, the minimal value of n is the upper floor value of this number divided by 2010, which is 1005.

B5: This is a pretty tricky problem. The way that I finally thought through is is to visualize what f(f(x)) would look like for an increasing function f(x). If f(x) is always positive, then f(f(x)) for x\to-\infty would be within an \epsilon distance of a fixed positive number, which is a contradiction as f'(x) gets arbitrarily close to 0 for x\to -\infty. Now we shall allow f(x) to be negative. If f(x) is not bounded below, then there will exist negative x such that f(f(x)) will be negative. This contradicts the fact that f'(x) is always non-negative. Now consider f(x) which is bounded below, but can still be negative. Clearly f(x) has to become positive at some point, as f'(x) must be positive. If it becomes positive only when x>0, then f(f(x)) for x<0 will still be negative, which is a contradiction as f'(x)=f(f(x)) must be non-negative. Let -a (a is assumed to be positive here) be \lim\limits_{x\to-\infty} f(x). If f(x) becomes positive when x>-a, then f(f(x)) for x\to-\infty will be negative, which is again a contradiction. Hence, f(x) has to become positive for x<-a. Now if f(-a)>0, then f(f(x)) for x\to -\infty will be arbitrarily close to a fixed positive number, which is a contradiction as f'(x) for x\to -\infty approaches 0. Hence, f(a) must be 0.

Hence, the only remaining possibility is a function f(x) such that \lim\limits_{x\to -\infty} f(x)=-a and f(-a)=0. Let us analyze this function. We know that f'(-a)=f(0). Also, for x\in(-a,0], f'(x)>f(0). Hence, f(0)>af'(-a). This implies that a<1. I haven’t yet determined how to eliminate this case.

Lie derivatives: a simple idea behind a messy calculation

I want to write about Lie derivatives. Because finding good proofs for Lie derivatives in books and on the internet is a lost cause. Because they have caused me a world of pain. Because we could all do with less pain.

In all that is written below, we assume that all Lie derivatives are being found in the direction of the vector field X, and that \phi_t is the flow along this vector field.

What is a Lie Derivative? A Lie derivative is a derivative, but for things more complicated than functions. Basically, for a given vector field X, T(p+tX)=f(p)+tL_XT (roughly speaking). Here T is a tensor, and a function is a special case of a tensor:

f(p+tX)=f(p)+tL_Xf=f(p)+tD_Xf, where D_Xf=L_Xf.

How do we then find a workable definition for a Lie derivative? Let us calculate the Lie derivative of a vector field.

Let me first try and explain what I will try and do. I know how functions and their derivatives behave. f(p+tX)\approx f(p)+ t(derivative of f in the X direction). We will use this simple derivative rule of f, wherever we can, to find out the what the Lie derivative of a vector field is.

For a vector field

Y, we have Y(p+tX)=Y(p)+t(L_XY)(p).

This, however does not make sense, as Y(p)+t(L_XY)(p) is a vector field at p, while Y(p+tX) is a vector field at p+tX. Hence, the correct definition should be


This is equivalent to the following definition:


This, in turn, is equivalent to the formulation


We shall now try to simplify these terms.


to first order in t. Similarly,

(\phi)^{-t}_*(Y(p+tX))(f)=Y(p+tX)(f\circ \phi^{-t}_*)=Y(p+tX)(f(p+tX)-tX(f))

Adding these two terms together, we get (L_XY(p))(f)=X(Y(f))-Y(X(f))

A minor technical point is that the left hand side is at p, while the right hand side is at p+tX. However, implicitly we have taken a limit t\to 0. Hence, as the vector fields are continuous, we get

(X(Y(f))-Y(X(f)))(p+tX)\to (X(Y(f))-Y(X(f)))(p)

This expression can be generalized really simply to tensors. Let us find out what the Lie derivative of a (0,n) tensor is:


The terms can be simplified in a similar way as above: T(p)(Y_1,\dots,Y_n) is a function. Hence, it is equal to

T(p+tX)(Y_1,\dots,Y_n)-tD_X(T(p+tX)(Y_1,\dots,Y_n)). On the other hand,

(\phi)^{-t}_*(T(p+tX)((Y_1,Y_2,\dots,Y_n))=T(Y_1\circ\phi^{-t}_*,\dots,Y_n\circ \phi^{-t})=T((Y_1-tL_XY_1)(p+tX),\dots,(Y_n-tL_XY_n)(p+tX))

This is easily simplified to give


Coming to grips with Special Relativity

Contrary to popular opinion, Special Relativity is not a more specialized, more involved part of General Relativity. It is the easier of the two Relativity theories, involving only thought experiments and Linear Algebra. However, despite having been exposed to ideas from this theory right from school, and also taking an advanced course (and doing well) in it, I have always felt that I don’t really understand this theory. And many people I know in grad school, those who have taken this and more advanced courses, feel the same way.

Reason for not really knowing what’s going on: Time dilation is explained by light clocks. But that’s just one kind of clock!! What if we had a different kind of clock? Would it still show that time is slowing down in a moving frame? These and other misunderstood thought experiments give one the impression that only our perception of time and length are changing. Time and length aren’t really changing. And this is despite accepting easily the two postulates of Special Relativity: that the speed of light is same in all frames, and that the laws of Physics are valid in all inertial frames.

The motivation of this article is that we need better thought experiments to understand Special Relativity. And the author, recently fueled by the brilliantly written autobiography of Einstein by Walter Isaacson, hopes to do just the same.

Length contraction

Maxwell’s laws specify the speed of light, and their formulation suggests that it should be the same in all inertial frames. Now imagine that you’re traveling in a train, and you have a 1 ft wide window. A window that is 1 ft while stationary, will appear to be 1 ft long when it is moving, if you’re moving with it. Hence, lengths don’t change while you’re in the same frame. Now if you’re observing from the platform, the light will travel across the window in time t. Similarly, if you’re inside the train, light will travel across the window in time t. All good. So what has changed? If I stand on the platform and observe the moving train, I can see that the relative velocity of the train and the light beam is low. Hence, if the window becomes shorter in the direction of motion of the light beam, all will be well. Hence, the window remains 1 ft in the frame of the moving train. But it shortens in the frame of reference of the platform.

Does this mean that there is no absolute length? There is! It is the length measured in the frame of reference of the window.

Time dilation

Now we’ll have to move perpendicular to the motion of the train. We all know the famous time clock, in which a light beam bounces off mirrors that are placed parallel to the motion of the train.


Here’s what you need to remember: when the light beam hits the mirror, you see it, the person sitting inside the train sees it, everyone sees it at the same moment. Alright. Here we go.

The light travels a longer distance, if you’re observing from the platform. Hence, as the speed of light is the same in both frames, the person standing on the platform should see the light beam reflecting from the top mirror and arriving at the bottom mirror in t seconds, while the person sitting inside the train should see the light beam arriving at the bottom mirror in, say, t' seconds. Clearly, t>t'. However, they both see the light arrive at the bottom mirror at the same moment. There can be no discrepancy about this. It is almost like t' expanded, although remaining of the same magnitude, and became equal to t. This is what is called time dilation. For the person standing on the platform, if they were to look inside the train, they would imagine the world moving at a slower rate. Just imagine a slow motion movie running inside the train.

But has time really expanded? Have lengths really contracted? No! For the observer sitting in the train, the same length contractions and time time dilations will happen for phenomena on the platform. Basically there are two kinds of length- the length observed from the frame of the object, and the length observed from a moving frame. And the length observed from the moving frame is always shorter. The same can be said about time dilation.