Fruits of procrastination

Month: September, 2013

Why substitution works in indefinite integration

Let’s integrate \int{\frac{dx}{\sqrt{1-x^2}}} . We know the trick: substitute x for \sin\theta. We get dx=\cos\theta d\theta. Substituting into the original equation, we get \int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}=\int{\frac{\cos\theta d\theta}{|\cos\theta|}}. Let us assume \cos\theta remains positive throughout the interval under consideration. Then we get the integral as \theta or \arcsin x.

I have performed similar operations for close to five years of my life now. But I was never, ever, quite convinced with it. How can you, just like that, substitute dx for \cos\theta d\theta? My teacher once told me this: \frac{dx}{d\theta}=\cos\theta. Multiplying by d\theta on both sides, we get dx=d\theta. What?!! It doesn’t work like that!!

It was a year back that I finally derived why this ‘ruse’ works.

Take the function x^2. If you differentiate this with respect to x, you get 2x. If you integrate 2x, you get x^2+c. Simple.

Now take the function \sin^2\theta. Differentiate it with respect to \theta. You get 2\sin\theta.\cos\theta. If you integrate 2\sin\theta.\cos\theta, you get \sin^2\theta+c.

The thing to notice is when you integrate the two functions- 2x and 2\sin\theta.\cos\theta, you want a function of the form y^2. However and whatever I integrate, I ultimately want a function of the form y^2, so that I can substitute x for y to get x^2.

In the original situation, let us imagine there’s a function f(x)=\int{\frac{dx}{\sqrt{1-x^2}}}. We’ll discuss the properties of f(x). If we were to make the substitution x=\sin\theta in f(x) and differentiate it with respect to \theta, we’d get a function of the form \frac{1}{\sqrt{1-y^2}}\cos\theta, where y is \sin\theta. There are two things to note here:

1. The form of the derivative if f(x) wrt \theta is the same as that of f'(x), which is \frac{1}{\sqrt{1-y^2}}, multiplied by \cos\theta, or derivative of \sin\theta wrt \theta.

2. When any function is differentiated with respect to any variable, integration wrt the same variabe gives us back the same function. Hence, \int{\frac{\partial f}{\partial x}dx}=\int{\frac{\partial f}{\partial \theta}d\theta}

Coming back to \int{\frac{dx}{\sqrt{1-x^2}}}, let us assume its integral is f(x). It’s derivative on substituting x=\cos\theta and differentiating wrt \theta is of the same form as \frac{\partial f}{\partial x} multiplied by \cos\theta. This is a result of the chain rule of differentiation. Now following rule 2, we know \int{\frac{dx}{\sqrt{1-x^2}}}=\int{\frac{\cos\theta d\theta}{\sqrt{1-\sin^2\theta}}}.

How is making the substitution x=\sin\theta justified? Could we have made any other continuous substitution, like x=\theta^2 +\tan\theta^3? Let us assume we substitute x for g(\theta). We want g(\theta) to take all the values x can take. This is the condition that must be satisfied by any substitution. For values that g(\theta) takes by x doesn’t, we restrict the range of g(\theta) to that of x. Note that the shapes of f(x) as plotted against x and f(\sin\theta) as plotted against \theta will be different. But that is irrelevant as long as we can write the same cartesian pairs (m,n) for any variable, where m is the x-coordinate and n is the y-coordinate.

Summing the argument, we predict the form the derivative of f(x) will take when the substitution x=\sin\theta is made, and then integrate this new form wrt \theta to get the original function. This is why the ‘trick’ works.

Fermat’s Last Theorem

When in high school, spurred by Mr. Scheelbeek’s end-of-term inspirational lecture on Fermat’s Last Theorem, I tried proving the same for…about one and a half long years!
For documentation purposes, I’m attaching my proof. Feel free to outline the flaws in the comments section.

Let us assume FLT is true. i.e. x^n + y^n =z^n. We know x^n + y^n<(x+y)^n (n is assumed to be greater than one here). Hence, z<x+y. Moreover, we know z^n-x^n<(z+x)^n. Hence, y<z+x. Similarly, y+z<x.

So we have the three inequalities: x+y<z, x+z<y, and y+z<x.

x,y,z satisfy the triangle inequalities! Hence, x,y,z form a triangle.

Using the cosine rule, we get z^2=x^2 +y^2 -2xy\cos C, where C is the angle opposite side z.

Raising both sides to the power \frac{n}{2}, we get z^n=(x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}. Now if n=2 and c=\frac{\pi}{2}, we get z^2=x^2+y^2. This is the case of the right-angled triangle.

However, if n\geq 3, then the right hand side, which is (x^2 +y^2 -2xy\cos C)^{\frac{n}{2}}, is unlikely to simplify to x^n + y^n.

There are multiple flaws in this argument. Coming to terms with them was a huge learning experience.

Binomial probability distribution

What exactly is binomial distribution?

Q. A manufacturing process is estimated to produce 5\% nonconforming items. If a random sample of the five items is chosen, find the probability of getting two nonconforming items.

Now one could say let there be 100 items. Then the required probability woud be \frac{{5\choose 2}{95\choose 3}}{{100\choose 5}} . In what order the items are chosen is irrelevant. This roughly comes out to be 0.18, while the answer is 0.22. Where did we go wrong?

Why should we assume there are 100 items in total? Let us assume n\to\infty, as we determine \frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}} . What if 0.95 n and 0.05n are not integers? We use the gamma function.

We get \frac{{.05n\choose 2}{.95n\choose 3}}{{n\choose 5}}=\frac{\int_{0}^{\infty}{t^{0.05n}e^{-t} dt}.\int_{0}^{\infty}{t^{0.95n}e^{-t} dt}}{{n\choose 5}}

My textbook says this tends to {5\choose 2}(0.05)^2 (0.95)^2. This is something you could verify for yourself.

Another question. Say you roll a die 5 times. Find the probability of getting two 6s. The probability as determined by combinatorics is \frac{{5\choose 2}5^3}{6^5} . You must have applied the binomial theorem before in such problems. You know the answer to be {5\choose 2}(\frac{1}{6})^2 (\frac{5}{6})^3 . This matches with the answer determined before. So why is it that we’re right here in determining the probability accurately, while we were not before?

Binomial probability corroborates with elementary probability where separate arrangements of selected items are counted as distinct arrangements, and where the total number of items is known and not just guessed at. When the total number of items is not known and only percentages (percentage of success) is known, then binomial probability is an approximation arrived at by assuming n approaches infinity.

Continuous linear operators are bounded.: decoding the proof, and how the mathematician chances upon it

Here we try to prove that a linear operator, if continuous, is bounded.

Continuity implies: for any \epsilon>0, \|Tx-Tx_0\|<\epsilon for \|x-x_0\|<\delta

We want the following result: \frac{\|Ty\|}{\|y\|}\leq c , where c is a constant, and y is any vector in X.

What constants can be construed from \epsilon and \delta, knowing that they are prone to change? As T is a linear operator, \frac{\epsilon}{\delta} is constant. We need to use this knowledge.

We want \frac{\|Ty\|}{\|y\|}\leq \frac{\epsilon}{\delta} , or \delta\frac{\|Ty\|}{\|y\|}\leq {\epsilon} .

We have \|Tx-Tx_0\|=\|T(x-x_0)\|<\epsilon.

Hence, x-x_0=\delta.\frac{y}{\|y\|} .

\|T(\delta.\frac{y}{\|y\|})\|=\frac{\delta}{\|y\|}\|Ty\| .

We have just deconstructed the proof given on pg.97of Kreyszig’s book on Functional Analysis. The substitution x-x_0=\delta.\frac{y}{\|y\|} did not just occur by magic to him. It was the result of thorough analysis. And probaby such investigation.

But hey! Let’s investigate this. \frac{\delta}{\epsilon} is also constant! Let us assume \epsilon\frac{\|Ty\|}{\|y\|}\leq \delta . Multiplying on both sides by \frac{\epsilon}{\delta} , we get \frac{\epsilon^2}{\delta}\frac{\|Ty\|}{\|y\|}\leq \epsilon . This shows x-x_0=\frac{\epsilon^2}{\delta}\frac{y}{\|y\|} . Does this substitution also prove boundedness?

We have to show \|x-x_0\|<\delta . \frac{\epsilon^2}{\delta}<\delta only if \epsilon<\delta . Hence, this is conditionally true.

Similar investigations taking (\frac{\epsilon}{\delta})^n to be constant can also be conducted.

Linear operators mapping finite dimensional vector spaces are bounded,

Theorem: Every linear operator T:V\to W, where V is finite dimensional, is bounded.

Proof \frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{c(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{c}

where \|T(e_i)\|=\max\{\|T(e_1)\|,\|T(e_2)\|,\dots\}.

What we learn from here is

\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq c(|a_1|+|a_2|+\dots+|a_n|)




\|e_i\|(|a_1|+|a_2|+\dots+|a_n|)\geq\|a_1e_1+a_1e_2+\dots+a_ne_n\|\geq \|e_k\|(|a_1|+|a_2|+\dots+|a_n|)



Another proof of the assertion is

\frac{\|Tx\|}{\|x\|}=\frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|a_1e_1+a_1e_2+\dots+a_ne_n\|}\leq \frac{\|T(a_1e_1+a_1e_2+\dots+a_ne_n)\|}{\|e_k\|(|a_1|+|a_2|+\dots+|a_n|)}\leq \frac{\|T(e_i)\|}{\|e_k\|}

which is a constant.

Note: why does this not work in infinite dimensional spaces? Because the max and min of \|e_r\| and \|Te_r\| might not exist.

Riesz’s lemma decoded

This is a rant on Riesz’s lemma.

Riesz’s lemma- Let there be a vector space Z and a closed proper subspace Y\subset Z. Then \forall y\in Y, there exists a z\in Z such that |z-y|\geq \theta, where \theta\in (0,1), and |z|=1.

A proof is commonly available. What we will discuss here is the thought behind the proof.

For any random z\in Z\setminus Y and y\in Y, write \|z-y\|. Let a_{y\in Y}=\inf\|z-y\|. Then \|z-y\|\geq a. Also, there exists a y_0\in Y such that \|z-y_0\|\leq\frac{a}{\theta}. Then \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta. Because the vector space Z is closed under scalar multiplication, we have effectively proved \|z-y\|\geq\theta for any \theta\in (0,1) and y\in Y.

If there is some other vector v such that \|v-v_0\|\leq\frac{a}{\theta}, then \|\frac{z}{\|v-v_0\|}-\frac{y}{\|v-v_0\|}\|\geq\theta.

Hence, one part of Riesz’s lemma, that of exceeding \theta is satisfied by every vector z\in Z\setminus Y. The thoughts to take away from this is dividing by \theta or a number less than 1 increases everything, even a small increase from the infimum exceeds terms of a sequence converging to the infimum, and every arbitrary term in the sequence is greater than the infimum. When we say \theta can be any number in the interval (0,1), we know we’re skirting with boundaries. We could aso have thought of a proof in this direction: let b=\sup_{y\in Y} \|z-y\|. Then b\theta\leq\|z-y_0\|\leq b. However, for an arbitrary y\in Y, \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\leq\frac{1}{\theta}.

Hence, for every \theta\in (0,1), \theta\leq \|z-y\|\leq\frac{1}{\theta}.

Now what about \|z\|=1? This condition is satisfied only when z=z-y_0 in the expression \left\|\frac{z}{\|z-y_0\|}-\frac{y}{\|z-y_0\|}\right\|\geq\theta.

Hence, over in all, for every vector z\in Z-Y, there are infinite vectors which satisfy the condition of Riesz’s lemma. Also, for every such z, there is AT LEAST one unit vector which satisfies Riesz’s lemma (there can be more than one).¬†Hence, to think there can be only one unit vector in Z-Y which satisfies Riesz’s lemma would be erroneous.

Completing metric spaces

If you’ve read the proof of the “completion of a metric space”, then you surely must have asked yourself “WHY?”! Say we have an incomplete metric space X. Why can’t we just complete X by including the limit points of all its cauchy sequences?!

No. We can’t. The limit points of cauchy sequences may not be determinable.

The new space \overline{X} that we create, is it just X\cup \{\text{limit points of cauchy sequences in X}\}? No. It is a completely different space.

So what exactly is \overline{X}? \overline{X} is a space with a new bunch of points: equivalence classes of cauchy sequences in X such that \{a\}\sim\{b\} iff \lim\limits_{n\to\infty}d(a_n,b_n)=0.

If you read the proof, you’ll realise it does a bunch of random crap to prove \overline{X} is complete. WHY?? Couldn’t it have been simpler with less dense sets and the like?

Let’s create a cauchy sequence of the equivalence classes. How do we know that the limit point of this sequence exists? We’re stuck here. One wouldn’t know how to proceed.

On a more important note, we just have a bunch of equivalence classes whose limits we do not know. We have no idea how they behave with respect to each other. Should we have equivalence classes whose limit points we do know, then we’ll have some perspective on the structure of the space and what the limit point of the cauchy sequence is. We might not even know the terms of some such equivalence classes. How’re we supposed to analyze things we have absolutely no idea about?

Some information is better than no information. If we could find out the limit points of all such equivalence classes (or terms of the cauchy sequence, in this case), we could think of doing something productive. But we can’t determine the limit points. So what now?

Consider all equivalence classes of cauchy sequences which converge to points in the space X. This set is dense in \overline{X} (this is easy to prove).

A fundamental concept is this: Let us take a cauchy sequence \{a_1,a_2,a_3,\dots\}, and another cauchy sequence \{b_1,b_2,b_3,\dots\} which converges to a_N. Then \lim\limits_{n\to\infty}d(a_n,b_n)=\epsilon, where \epsilon is a fixed number. As N increases, the cauchy sequence \{b_i\} converges to \{a_i\}. Hence, we extrapolate from the concept of convergence of points to convergence of converging sequences. Can we think about the convergence of converging sequences in any other way? Something to think about. But this is definitely a useful concept to remember. Note that the limit point of \{a_i\} may not even be known.

So how is this concept relevant to the proof? We’ve associated with the original cauchy sequence \{x_i\} another cauchy sequence \{b_i\} with limit points in the space, as mentioned before. The association is such that \lim b_i=\lim x_i. Now the masterstroke- we map each sequence to the limit in the original space X: we map \{y_i\} converging to l, to the point l in X. Isn’t that a lot of potentially useless mapping? No. This is explained below.

What do we have here? We have a cauchy sequence \{l_1,l_2,l_3,\dots\}. This may or may not have a limit, which is inconsequential to the proof. Now let us take the cauchy sequence \{t^i\} converging to l_i. We know from before that \lim\limits_{n\to\infty}d(l_n,t_n)=0. Now let us take equivalence classes of the sequence \{l_1,l_2,\dots\}, and the sequences \{t_i\}. The cauchy sequence of equivalence classes of \{t_i\} will obviously converge to the equivalence class of l_i. As a result, the original \{x_i\} also converges to the equivalence class of \{l_i\}. We had associated \{x_i\} just so that we could get sequences converging to the terms of \{l_1,_2,\dots\}.

What is the point of creating these equivalence classes? Couldn’t we have formed a complete metric space in some other way? Thinking about cauchy sequences, something that immediately pops into mind is cauchy sequences of cauchy sequences. Cauchy sequences of what else can be formed? Cauchy sequences of squares of points? Will that space really be complete? Maybe there are other possibilites to form a complete metric space as derived from X, but this one is one that easily pops into mind after one gets comfortable with the concept of the cauchy sequence \{l_1,l_2,\dots\} and the sequences \{t_i\} converging to l_i. Whether metric spaces can be completed in other ways is something you and I should think about.


Today we will discuss the proof of o(ST)=\frac{o(S)o(T)}{o(S\cap T)}.
Here, S and T are groups. We know S\cap T\neq\emptyset, as e\in S\cap T.

Let s_1t_1=s_2t_2. Then s_1s_2^{-1}=t_2t_1^{-1}\in S\cap T. Take any a\in S\cap T. For any s_1,t_1\in S,T, find s_2=s_1a^{-1} and t_2=at_1. Then s_2t_2=s_1a^{-1}at_1=s_1t_1. Hence, |S\cap T| pairs of elements (s_2,t_2) can be found such that s_2t_2=s_1t_1 for any two s_1,t_1\in S,T. Hence, we can form equivalence classes which partition ST, all with |S\cap T| elements. This shows o(ST)=\frac{o(S)o(T)}{o(S\cap T)}.

We can also digress to more complicated situations like o(ST+W), and find similar formulae.

A new proof of Cauchy’s theorem

We will discuss a more direct proof of Cauchy’s theorem than the one given in Herstein’s “Topics in Algebra” (pg.61).

Statement: If G is an abelian group, and p|o(G), then there is an element g\in G such that g^{p}=e_G, and g\neq e_G.

We will prove this by induction. Let us assume that in every abelian group H of order |H|<|G|, if p|o(H)\implies \exists h\in H: h^p=e_H. Let N be a (by default normal) subgroup of G. If p|o(N), by the inductin hypothesis, \exists n\in N: n^p=e_N=e_G.

Let us now assume p|o(G) but p\not| o(N). This implies p|\frac{o(G)}{o(N)}\implies p|o\left(\frac{G}{N}\right). As o\left(\frac{G}{N}\right)<o(G), by the induction hypothesis, \exists (Nb)\in \frac{G}{N}: (Nb)^{p}=n_1bn_2b\dots n_pb=n_1n_2\dots n_p b^p=N. This implies b^p\in N\implies b^{p.o(N)}=e (e_G shall be simply be referred to as e from now on). b^{o(N)} is hence that element in G such that when raised to the power p, gives e.

Now all we have to prove is b^{o(N)}\neq e. Given below is my original spin on the proof.

We know p\not| o(N). And as p is prime, o(N) can’t have any common factors with it. Hence \gcd (p,o(N))=1. This proves there exist integers such that a.p+b.o(N)=1, where a,b\in\Bbb{Z}. Also, note that if b^{p}\in N, then (b^{p})^{z}\in N, for any z\in \Bbb{Z}.

Let us now assume b^{o(N)}=e. Then (b^{o(N)})^{b}.(b^{p})^{a}=e.(b^{p})^{a}\in N. Also note that (b^{o(N)})^{b}.(b^{p})^{a}=b^{a.p+b.o(N)}=b. The two statements imply b\in N. This contradicts the assumption that b\notin N. Now you would ask where was the assumption made?! The answer lies in the fact we said b^{o(N)} is the desired element which is not equal to e, such that when raised to p gives e. Had b been a part of N, then b^{o(N)}=e.

There’s an extraordinarily powerful trick I’d like to point out and explain here. When you have statements about b^a and b^c, where \gcd (a,c)=1, then we can make a statement about b by virtue of the fact \exists z_1,z_2\in\Bbb{Z} such that z_1.a+z_2.c=1.

Now we consider the proof of Sylow’s theorem for abelian groups, which runs along similar lines.

The statement is :if p is prime, p^\alpha|o(G) and p^{\alpha+1}\not|o(G), then there is a subgroup of order p^{\alpha} in G.

We will again prove by induction. If p^{i}|o(N), where N is a normal subgroup of G and i\leq \alpha, then the statement is true. Hence, let p^{\alpha}|o\left(\frac{G}{N}\right). This again makes \gcd (p^\alpha,o(N))=1. The rest of the proof is elementary.

Anti-climax: The extension to Sylow’s theorem is incorrect. Please try to determine the flaw yourself

Hint: the induction hypothesis is “for groups H of order smaller than o(G), if p^{alpha}|o(H) and p^{\alpha+1}\not| o(H), then there exists an element h\in H such that h^{p^\alpha}=e. Second hint: if p^{\alpha}\not|o(N), then that does not imply p^{\alpha}|\frac{o(G)}{o(N)}. Moreover, it is not necessary that \gcd(o(N),p^\alpha)=1.

Today we will discuss compactness in the metric setting. Why metric? Because metric spaces lend themselves more easily to visualisation than other spaces.

Let us imagine a metric space X with points scattered all over it. If we can find an infinite number of such points and construct disjoint open sets centred on them, then X cannot be compact.

Hence what does it mean to be compact in a metric setting?

Compactness implies that an infinite number of points can’t be ‘far’ away from each other. There can only be a finite number of “clumps” of points such that each neighbourhood, however small, contains an infinite number of such “clumped-together” points. So should you peer at one cump through a microscope, however, strongly you magnify the clump, you will not see discrete points. You will see an impossibly dense patch that shall remain a solid continuus clump of points.¬†