4 out of 5 dentists recommend this WordPress.com site

### A note on the gradient of a function.

I want to insert a note on $grad$ $f(X)$ (gradient of $f(X)$).

1. It is not perpendicular to everything in the surface. Most proofs only go as far as to prove it is perpendicular to continuous parameterized curves. Nothing more. Stop reading too deeply into it.

2. It is mostly useful for finding perpendiculars to all parameterized curves, rather than the parameterized curves themselves. The tangent is an exception, as the normal vector to a straight line can easily be used to find the straight line. For example, if we know that $a\overline{i}+b\overline{j}$ is perpendicular to a straight line passing through point $P$, we can easily determine the straight line. Non-straight line curves do not in general lend themselves to such determination with knowledge of just a perpendicular vector and a point through which the curve passes.

### The chain rule in multi-variable calculus: Generalized

Now we’ll discuss the chain rule for $n$-nested functions. For example, an $n$-nested function would be $g=f_1(f_2(\dots(f_n(t))\dots)$. What would $\frac{\partial g}{\partial t}$ be?

We know that

$g(t+h)-g(t)=\frac{f_1(f_2(\dots(f_n(t+h))\dots)-f_1(f_2(\dots(f_n(t))\dots)}{f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)}.f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)$.

If $f_2$ is continuous, then

$g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)+g_1$ such that $\lim_{[f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)]\to 0}g_1=0$, which is equivalent to saying $\lim\limits_{t\to 0}g_1=0$.

In turn

$f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)=\frac{\partial f_2}{\partial f_3}.f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)+g_2$

such that $\lim\limits_{t\to 0}g_2=0$.

Hence, we have

$g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.\left[f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)\right]+g_2)+g_1$

Continuing like this, we get the formula

$g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.(\dots(\frac{\partial f_n}{\partial t}.t+g_n)+g_{n-1})\dots)+g_2)+g_1$

such that $\lim\limits_{t\to 0}g_i=0$ for all $i\in \{1,2,3,\dots,n\}$.

From the above formula, we get

$\lim\limits_{t\to 0}g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.\frac{\partial f_2}{\partial f_3}.\dots\frac{\partial f_n}{\partial t}.t$

### Multi-variable differentiation.

There are very many bad books on multivariable calculus. “A Second Course in Calculus” by Serge Lang is the rare good book in this area. Succinct, thorough, and rigorous. This is an attempt to re-create some of the more orgasmic portions of the book.

In $\Bbb{R}^n$ space, should differentiation be defined as $\lim\limits_{H\to 0}\frac{f(X+H)-f(X)}{H}$? No, as division by a vector $(H)$ is not defined. Then $\lim\limits_{\|H\|\to 0}\frac{f(X+H)-f(X)}{\|H\|}$? We’re not sure. Let us see how it goes.

Something that is easy to define is $f(X+H)-f(X)$, which can be written as

$f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)$ ($H$ is the $n$-tuple $(h_1,h_2,\dots,h_n)$).

This expression in turn can be written as

$f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\left[f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2+h_2,\dots,x_n+h_n)\right]\\+\left[f(x_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n+h_n)\right]+\dots+\left[f(x_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2+h_2,\dots,x_n)\right]$.

Here, we can use the Mean Value Theorem. Let us suppose$s_1\in((x_1+h_1,x_2+h_2,\dots,x_n+h_n),(x_1,x_2+h_2,\dots,x_n+h_n))$,

or in general

$s_k\in((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n),(x_1,x_2,\dots,x_k\dots,x_n+h_n))$. Then

$f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\\ \displaystyle{\sum\limits_{k=1}^n{D_{x_k}(x_1,x_2,\dots,s_k,\dots,x_n+h_n).((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n)-(x_1,x_2,\dots,x_k,\dots,x_n+h_n))}}$.

No correction factor. Just this.

What follows is that a function

$g_k=D_{x_k}(x_1,x_2,\dots,s_k,\dots,x_n+h_n)-D_{x_k}(x_1,x_2,\dots,x_k,\dots,x_n)$

is assigned for every $k=\{1,2,3,\dots,n\}$.

Hence, the expression becomes

$f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\sum\limits_{k=1}^n {D_{x_k}(x_1,x_2,\dots,x_n)+g_k}$

It is easy to determine that $\lim\limits_{H\to 0}g_k=0$.

The more interesting question to ask here is that why did we use mean value theorem? Why could we not have used the formula $f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)\\=\sum\limits_{k=1}^n {\left[D_{x_k}(x_1,x_2,\dots,x_k\dots,x_n+h_n)+g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)\right]}$,

where $\lim\limits_{h_k\to 0}g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)=0$??

This is because $g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)$ may not be defined at the point $(x_1,x_2,\dots,x_n)$. If in fact every $g_k$ is continuous at $x_1,x_2,\dots,x_n)$, then we wouldn’t have to use mean value theorem.

Watch this space for some more expositions on this topic.

A function is differentiable at $X$ if it can be expressed in this manner: $f(X+H)-f(X)=($grad$f(X)).H+\|H\|g(X,H)$ such that $\lim\limits_{\|H\|\to 0}g(X,H)=0$. This is a necessary and sufficient condition; the definition of differentiability. It does not have a derivation. I spent a very long time trying to derive it before realising what a fool I had been.