4 out of 5 dentists recommend this site

A note on the gradient of a function.

I want to insert a note on grad f(X) (gradient of f(X)).

1. It is not perpendicular to everything in the surface. Most proofs only go as far as to prove it is perpendicular to continuous parameterized curves. Nothing more. Stop reading too deeply into it.

2. It is mostly useful for finding perpendiculars to all parameterized curves, rather than the parameterized curves themselves. The tangent is an exception, as the normal vector to a straight line can easily be used to find the straight line. For example, if we know that a\overline{i}+b\overline{j} is perpendicular to a straight line passing through point P, we can easily determine the straight line. Non-straight line curves do not in general lend themselves to such determination with knowledge of just a perpendicular vector and a point through which the curve passes.

The chain rule in multi-variable calculus: Generalized

Now we’ll discuss the chain rule for n-nested functions. For example, an n-nested function would be g=f_1(f_2(\dots(f_n(t))\dots). What would \frac{\partial g}{\partial t} be?

We know that


If f_2 is continuous, then

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)+g_1 such that \lim_{[f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)]\to 0}g_1=0, which is equivalent to saying \lim\limits_{t\to 0}g_1=0.

In turn

f_2(\dots(f_n(t+h))\dots)-f_2(\dots(f_n(t))\dots)=\frac{\partial f_2}{\partial f_3}.f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)+g_2

such that \lim\limits_{t\to 0}g_2=0.

Hence, we have

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.\left[f_3(\dots(f_n(t+h))\dots)-f_3(\dots(f_n(t))\dots)\right]+g_2)+g_1

Continuing like this, we get the formula

g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.(\frac{\partial f_2}{\partial f_3}.(\dots(\frac{\partial f_n}{\partial t}.t+g_n)+g_{n-1})\dots)+g_2)+g_1

such that \lim\limits_{t\to 0}g_i=0 for all i\in \{1,2,3,\dots,n\}.

From the above formula, we get

\lim\limits_{t\to 0}g(t+h)-g(t)=\frac{\partial f_1}{\partial f_2}.\frac{\partial f_2}{\partial f_3}.\dots\frac{\partial f_n}{\partial t}.t

Multi-variable differentiation.

There are very many bad books on multivariable calculus. “A Second Course in Calculus” by Serge Lang is the rare good book in this area. Succinct, thorough, and rigorous. This is an attempt to re-create some of the more orgasmic portions of the book.

In \Bbb{R}^n space, should differentiation be defined as \lim\limits_{H\to 0}\frac{f(X+H)-f(X)}{H}? No, as division by a vector (H) is not defined. Then \lim\limits_{\|H\|\to 0}\frac{f(X+H)-f(X)}{\|H\|}? We’re not sure. Let us see how it goes.

Something that is easy to define is f(X+H)-f(X), which can be written as

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n) (H is the n-tuple (h_1,h_2,\dots,h_n)).

This expression in turn can be written as


Here, we can use the Mean Value Theorem. Let us supposes_1\in((x_1+h_1,x_2+h_2,\dots,x_n+h_n),(x_1,x_2+h_2,\dots,x_n+h_n)),

or in general

s_k\in((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n),(x_1,x_2,\dots,x_k\dots,x_n+h_n)). Then

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\\ \displaystyle{\sum\limits_{k=1}^n{D_{x_k}(x_1,x_2,\dots,s_k,\dots,x_n+h_n).((x_1,x_2,\dots,x_k+h_k,\dots,x_n+h_n)-(x_1,x_2,\dots,x_k,\dots,x_n+h_n))}}.

No correction factor. Just this.

What follows is that a function


is assigned for every k=\{1,2,3,\dots,n\}.

Hence, the expression becomes

f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)=\sum\limits_{k=1}^n {D_{x_k}(x_1,x_2,\dots,x_n)+g_k}

It is easy to determine that \lim\limits_{H\to 0}g_k=0.

The more interesting question to ask here is that why did we use mean value theorem? Why could we not have used the formula f(x_1+h_1,x_2+h_2,\dots,x_n+h_n)-f(x_1,x_2,\dots,x_n)\\=\sum\limits_{k=1}^n {\left[D_{x_k}(x_1,x_2,\dots,x_k\dots,x_n+h_n)+g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)\right]},

where \lim\limits_{h_k\to 0}g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k)=0??

This is because g_k(x_1,x_2,\dots,x_k,\dots,x_n+h_n,h_k) may not be defined at the point (x_1,x_2,\dots,x_n). If in fact every g_k is continuous at x_1,x_2,\dots,x_n), then we wouldn’t have to use mean value theorem.

Watch this space for some more expositions on this topic.

Watch this space for some more posts on this topic.

One passing note as I end this article.

A function is differentiable at X if it can be expressed in this manner: f(X+H)-f(X)=(gradf(X)).H+\|H\|g(X,H) such that \lim\limits_{\|H\|\to 0}g(X,H)=0. This is a necessary and sufficient condition; the definition of differentiability. It does not have a derivation. I spent a very long time trying to derive it before realising what a fool I had been.