# Notation in Riemannian Geometry

I have always found the notation in Riemannian Geometry to be very confusing. How and why are we doing the things that we’re doing? What does all this abstruse notation mean? This is my attempt to write a helpful guide for anyone starting out in this field.

1. Why the affine connection? Why this notion of derivative in particular?

A common sentiment that goes around in mathematical circles is that we need a coordinate invariant notion of a derivative. When we say $\frac{\partial f}{\partial x}$, we are specifying a Euclidean coordinate chart, using which we are differentiating the function $f$. But Euclidean charts are not always the most convenient setting for calculations- sometimes we need polar coordinates, for instance. Hence, if we could represent equations in a way that does not assume a coordinate chart, it will make life much simpler for us. There would be no complicated Euclidean-to-polar coordinate conversion operations, for example.

Let us now dig slightly deeper into what a coordinate invariant mathematical expression actually means. Suppose we have a physical law saying that a quantity $f$ exists such that $\frac{\partial f}{\partial x}=1$. Now if we have a transformation $x\to x'$ such that $x'=\frac{1}{2}x$, then we know that this law cannot hold true anymore. This is because if $\frac{\partial f}{\partial x}=1$, then $\frac{\partial f}{\partial x'}=\frac{1}{2}$. Hence, when we state this physical law, we also have to specify the coordinate system that we must choose.

Much importance, at least in Physics, is given to the fact that there is no preferred coordinate system. All inertial systems have the same Newton’s laws. In Special Relativity, we find quantities that are Lorentz invariant. Why can we not just specify the coordinate system each time we mention a law? This is because things can get unmanageable and cumbersome if we propose a different law for each moving reference frame. Moreover, these “laws” might also change when we change units of space and time: in fact a choice of such units is also a choice of coordinate systems. Therefore, physical laws should be such that regardless of whether we choose metres of feet, and regardless of whether we choose Euclidean coordinates or polar coordinates, they remain invariant. I can now choose my preferred coordinate system which simplifies calculations the most, and then arrive upon the answer.

Now that we’ve established that we need a coordinate invariant notion of a derivative, why the affine connection in particular? Mainly because the properties $\nabla g=0$ and $\nabla_X Y-\nabla_Y X=[X,Y]$ simplify a lot of calculations. These are just constraints that we put on the definition, which give us a unique connection. We could also have put other constraints, and perhaps gotten a different unique connection.

How do we know connections are coordinate invariant? Because connections, by definition, have the property that $\nabla_X=X^i\nabla_i$. Hence, the coordinate invariance property follows from the definition itself. When we don’t specify a specific coordinate system, and claim that a certain mathematical expression holds in general, we have written down a coordinate invariant expression. This is exactly what we do here.

Another important point to note is that because connections follow the product rule of differentiation, the difference of two connections is always a tensor. Hence, if $\nabla$ is the affine connection and $\nabla'$ is any other connection, we can just define the tensor $\nabla-\nabla'$, and we’re done. How do we define the affine connection in an intuitive way then? We seem to have a lot of choice, as we can choose any other already defined connection $\nabla'$, and then write $\nabla=\nabla'+$ some tensor. Here, we choose $\nabla'$ to be Euclidean differentiation. This allows us to interpret the affine connection as a “correction” to regular differentiation.

2. Why do we deal with abstract notation at all? Why do we have something like $g^{IJ}\nabla_I\nabla_KT_{JL}$? The indices show what kinds of mathematical objects we are dealing with. The $\nabla_I$, for instance, tells us that it takes in a vector. $g^{IJ}\nabla_I\nabla_KT_{JL}$, when it accepts $2$ vectors, will become a function.

Let us now consider the tensor $g^{IJ}\nabla_I\nabla_KT_{JL}(X,Y)$. How do we know where $X$ and $Y$ go? Does $X$ go the the $\nabla_I$ or the $T$? We solve this conundrum by the following rule: $X$ goes to the left-most place it can go, and $Y$ goes to the left-most place after that. Another, perhaps clearer way of saying this is that we contract $X$ with the $I$ index and $Y$ with the $L$ index. We use this notation because of the tensorial nature of this mathematical object- when $X$ goes to the $\nabla_I$, we get $X^I\nabla_I$.

So is that it? Is this expression equal to $g^{IJ}\nabla_I\nabla_KT_{JL}X^I Y^L$? Yep. It’s as simple as that. But this doesn’t “mean” anything. Let me try and elaborate on this statement. This is just abstract notation. Fluff. Refined nonsense. We know which vector goes where. We have some information about the mathematical object we are dealing with. However, performing actual calculations is a completely different beast.

How do the calculations go, though? We first select a coordinate system and vector space basis elements. We then perform the tensorial differentiation via the connection $\nabla$. It is only then that we plug in the vectors into the right places. What does the $g^{IJ}$ do? There are again two levels of understanding- one level is just manipulating this expression abstractly, and another is actually choosing a coordinate system and calculating the final expression. For the abstract level, we can just write this as $\nabla^J\nabla_KT_{JL}$. Now the actual calculation: suppose we choose an orthonormal basis. Then we can write $g^{IJ}\nabla_I\nabla_KT_{JL}(X,Y)$ as $\sum\limits_{i}\nabla_I\nabla_KT_{JL}(e_i,X,e_i,Y)$, and then simplify. Let us simplify this particular expression. This becomes $\sum\limits_{i}\nabla_{e_i}(\nabla_KT_{JL})(X,e_i,Y)$, which simplifies to $\nabla_{e_i}\nabla_KT_{JL}(X,e_i,Y))-\nabla_KT_{JL}(\nabla_{e_i}X,e_i,Y)-\nabla_KT_{JL}(X,\nabla_{e_i}e_i,Y)-\nabla_KT_{JL}(X,e_i,\nabla_{e_i}Y)$

Each of these terms can also be simplified using the same rules of tensorial differentiation. Hence, the actual calculation is a long iterative process. When we deal with these expressions abstractly, however, manipulations are generally substantially shorter.

3. Whenever we perform calculations at a point, they become substantially shorter and easier. Why? And what does performing a calculation at a point even mean? When we select a tensorial operation and vector or co-vector fields to operate on, we are selecting global entities. All of the mathematical objects defined above are defined over the whole space or manifold. However,if $X|p=\tilde{X}|_p$, then $T(X)|_p=T(\tilde{X})|_p$. where $T$ is a tensor. The utility of this fact is that given a complicated vector field $X$, we can choose a really simple $\tilde{X}$ with special properties that will make life easy. For instance, when dealing with tensors, we can always choose $\tilde{X}_i$ such that they $\nabla{e_i} X_j=0$. This substantially simplifies calculations. However, the most common pre-requisite for such drastic simplifications is that we are dealing with tensors, which is something one should always check.

4. Raising and lowering indices- We know that a lowered index means that the tensor accepts vectors, and a raised index means that it accepts co-vectors. However, why do we raise a lowered index, and vice-versa? We will first talk about lowering an index. Consider a vector $V^I$. We can lower that index via $g_{IJ}V^I=V_J$. In common parlance, we say that $V_J$ is now a co-vector. But how did we magically get a co-vector by just multiplying with $g_{IJ}$? Let us see what happens when we contract a vector $W^J$ with $V_J$. We get $V_JW^J=g_{IJ}V^IW^J$, which is the inner product of the two vectors! Hence, $g_{IJ}$ converted a vector into a co-vector because it transformed $V^I\to \langle V_I,- \rangle$.
The same can be said about the raising of indices for co-vectors. Because inner products of all kinds of tensors are defined only using the metric, $g_{IJ}$ or $g^{IJ}$ are involved in raising or lowering indices for all tensors.

5. What does $\delta^I_J$ do exactly?
Essentially $\delta(e_i,\omega^j)=1$ if and only if $i=j$. This implies that $\delta^I_JV^J=V^I$ because, by definition, $(\delta^I_JV^J)(\omega_I)=1$, and $V^I$ is the unique vector with this property.

6. Who came first, the dual or the metric? If we think of a function $f$ as a co-vector, then we know that its dual $\nabla f$ can only be defined in terms of its metric. Hence, we can conclude that duals are not defined independently. The dual of a vector $V^I$ is a covector $V_J$ such that given a vector $W$, $V_JW^J=\langle V^I, W^J\rangle$. In fact, the dual $V_J$ doesn’t have to be such that $V_JV^J=1$. The value of $V_JV^J=g_{IJ}V^IV^J$ completely depends on the metric.

7. What does it mean to raise the index of $\nabla$? In other words, what does $\nabla^J=g^{IJ}\nabla_I$ mean? When we contract a co-vector $W$ with $\nabla^J$ in the form $\nabla^JW_J$, then what we are really doing is determining the inner product $g^{IJ}\nabla_I W_J$. However, $\nabla$ is an operator that acts on other tensors. Hence, the actual calculation is a completely different story than this abstract nonsense. For co-vector $W$, consider $\nabla^I(W)=g^{IJ}\nabla_J(W)=g^{IJ}\nabla_J W_I=\sum\limits_{i,j}(\nabla W)(e_j,e_i)$. Like before, the actual calculation requires substantial simplification before we can just bring in the vectors and sum over everything.