cozilikethinking

4 out of 5 dentists recommend this WordPress.com site

Month: September, 2013

Complete metric spaces are Baire spaces- a discussion of the proof.

I refer to the proof of the statement “Every complete metric space is a Baire space.”
The proof of this statement, as given in “Introduction to Banach spaces and their Geometry”, by Bernard Beauzamy, is

Let U_1,U_2,\dots be a countable set of _open_ dense subsets of complete metric space X. Take any open set W. We will prove that (\bigcap U_i)\cap W is non-empty.U_1\cap W\neq\emptyset. For point x_1\in U_1\cap W, construct, \overline{B}(x_1,r_1)\subset U_1\cap W. Here \overline{B}(x_1,r_1) is the closure of B(x_1,r_1). Not take the intersection of U_2 and \overline{B}(x_1,r_1), and construct \overline{B}(x_2,r_2)\subset U_2\cap \overline{B}(x_1,r_1). Similarly, for every n\in\Bbb{N}, construct \overline{B}(x_n,r_n)\subset U_n\cap \overline{B}(x_{n-1},r_{n-1}). Also, ensure that 0<r_k<\frac{1}{k}; in other words ensure that r_k converges to 0.

It is clear that the points \{x_1,x_2,\dots\} form a cauchy sequence. As X is complete, the limit l of this point exists in X. This point is present in all U_n, and also in all \overline{B}(x_n,r_n) (as a result, it is present in W). Hence, it is present in \bigcap U_i. Hence (\bigcap U_i)\cap W is not empty. This proves the theorem.

 

We have taken one open set W, and proved, (\bigcap U_i)\cap W is not empty. Should we take another open set T, we will again be able to prove (\bigcap U_i)\cap T is not empty. Note that this does not in any way imply that (\bigcap U_i)\cap W=(\bigcap U_i)\cap T. All that it says is that any open set W has a non-empty intersection with (\bigcap U_i), which is sufficient for (\bigcap U_i) to be a dense set in X.

Why do we not take B(x_k,r_k) instead of \overline{B}(x_k,r_k)? This is because if the balls are not closed, then it would be difficult to prove the limit point of the sequence \{x_1,x_2,\dots\} is also present in \bigcap_i B(x_i,r_i). Try this for yourself. Also, it is ok to take closed balls because if U_i has a non-empty intersection with B(x_{i-1},r_{i-1}), then it definitely has a non-empty intersection with \overline{B}(x_{i-1},r_{i-1}), and we are only concerned with getting an intersection so that we can construct another closed ball inside it. It is elementary to prove that the limit point of \{x_1,x_2,\dots\} has to lie inside the intersection of all the closed balls (hint: \overline{B}(x_k,r_k) contains all x_j of the cauchy sequence \{x_1,x_2,x_3,\dots\}, where j\geq k).

Today I plan to write a treatise on \ell_p^n spaces. \ell_p^n are normed spaces over \Bbb{R}^n with the p-norm, or \|\|_p.

Say we have the \ell_p^n space over \Bbb{R}^n. This just means that \|x\|_p=\left( |x_1|^p + |x_2|^p+\dots +|x_n|^p\right)^{\frac{1}{p}}, where x\in \Bbb{R}^n. That \|\|_p is a norm is proved using standard arguments (including Minkowski’s argument, which is non-trivial).

Now we have a metric in \ell_p^n spaces: d(x,y)=\|x-y\|=\left( |x_1-y_1|^p + |x_2-y_2|^p+\dots +|x_n-y_n|^p\right)^{\frac{1}{p}}.

Now we prove that every \ell_p^n space is complete. Say we have a cauchy sequence \{x_1,x_2,x_3,\dots\}. This means that for every \epsilon>0, there exists an N\in\Bbb{N} such that for i,j>N \|x_i-x_j\|<\epsilon. This implies that for any e\in\{1,2,\dots,n\}, \|x_e-y_e\|<\epsilon. As \Bbb{R} is complete, there exists a limit point for each coordinate. Using standard arguments from here, we can prove that L_p spaces are complete. L_p over \Bbb{R}^n is called l_p^n

\ell_p^\infty spaces are also complete.

Sufficient conditions for differentiability in multi-variable calculus.

We will be focusing on sufficient conditions of differentiability of f:\Bbb{R}^2\to R. The theorem says that if f_x and f_y exist and are continuous at point (a,b), then f is differentiable at (a,b).

We have f(x,y), which we know is partially differentiable with respect to x and y, but may not be differentiable in general.

Differentiability at point a in the \Bbb{R}^n\to \Bbb{R} setting is described as \frac{f(a+h)-f(a)}{\|h\|}-D \frac{h}{\|h\|}=\epsilon(h)

D is the matrix of partial derivatives with respect to the n independent variables in \Bbb{R}^n.

What does all this mean? This is something that confused me for some time, and is likely to be helpful for others with the same doubts.

This is the \epsilon-\delta definition of differentiation. We say \lim\limits_{h\to 0} \frac{f(a+h)-f(a)}{h}, if it exists, is the derivative of f at a. Let the limit be l. We’re effectively saying that  for any \epsilon>0, \left|\frac{f(a+h)-f(a)}{h}-l\right|<\epsilon if 0<h<\delta. Here h=|x-a|. However, this is not what we SEEM to say through this definition. What we seem to be saying is for any \delta>0 and 0<h<\delta, there exists \epsilon>0 such that \frac{f(a+h)-f(a)}{\|h\|}-D \frac{h}{\|h\|}=\epsilon(h). Is there a difference? Yes. This will be illustrated below.

In our example of multi-variable differentiation, we’ve made \epsilon a function of h. Why is that? What we mean by that is not that \epsilon can be any function of h, like h+c, where c is a constant. What we mean is \lim\limits_{h\to 0} \epsilon=0, although this is not obvious from the fact that \epsilon is a function of h. Why should we explicitly mention the fact that \epsilon should tend to 0 as h\to 0? Because in the original definition \frac{f(a+h)-f(a)}{\|h\|}-D \frac{h}{\|h\|}=\epsilon(h) we have made the argument that for any \delta>0 and 0<h<\delta, \exists \epsilon such that the difference between \frac{f(a+h)-f(a)}{\|h\|} and D \frac{h}{\|h\|} is \epsilon. Here, \epsilon may not converge to 0! We have just proven its existence, and none of its properties! For example, we could have said that for any \delta>0 and 0<h<\delta, \frac{f(a+h)-f(a)}{\|h\|}-D \frac{h}{\|h\|}=2. Here \epsilon=2. We have proven the existence of \epsilon.

It is only when we specify that \lim\limits_{h\to 0} \epsilon=0 that we make the definition of the derivative clear- that it is \lim\limits_{h\to 0}\frac{f(a+h)-f(a)}{\|h\|}. Such a limit is defined when for any \epsilon>0, there exists \delta>0 such that \left |\frac{f(a+h)-f(a)}{\|h\|}-l\right|<\epsilon when 0<h<\delta.

Now we come back to sufficient conditions of differentiability. Let a function f(x,y) be differentiable with respect to x and y at (a,b). This implies

f(a+\Delta x,b)-f(a,b)=\epsilon(\Delta x)\Delta x + f_x(a,b)\Delta x

and

f(a,b+\Delta y)-f(a,b)=\epsilon(\Delta y)\Delta y + f_y(a,b)\Delta y.

Adding these two, we get

f(a+\Delta x,b)+f(a,b+\Delta y)-2f(a,b)=\epsilon(\Delta x)\Delta x+\epsilon(\Delta y)\Delta y + f_x(a,b)\Delta x+f_y(a,b)\Delta y.

Can we say

f(a+\Delta x,b)-f(a,b)\approx f(a+\Delta x,b+\Delta y)-f(a,b+\Delta y),

assuming \Delta x and \Delta y are small enough? We have

f(a+\Delta x,b)+f(a,b+\Delta y)-2f(a,b)=\epsilon(\Delta x)\Delta x+\epsilon(\Delta y)\Delta y+f_x(a,b)\Delta x+f_y(a,b)\Delta y.

Now we use the property that the partial derivatives are continuous.

\frac{f(a+\Delta x,b)-f(a,b)}{h}=\frac{f(a+\Delta x,b+\Delta y)-f(a,b+\Delta y)}{h}\implies f(a+\Delta x,b)-f(a,b)=f(a+\Delta x,b+\Delta y)-f(a,b+\Delta y)+o(h).

o(h) is the correction factor which is utimately rectified. Remember that here h=\Delta x or \Delta y. It can’t be a combination of both, as only partial derivatives f_x,f_y are continuous.

We now have the formula

f(a+\Delta x,b+\Delta y)=f_x(a,b)\Delta x+f_y(a,b)\Delta y+\epsilon(\Delta x)\Delta x+\epsilon(\Delta y)\Delta y+o(h).

The rest of the proof is elementary, and can be found in any complex anaysis textbook (pg. 67 of Complex Variabes and Applications, Brown and Churchill). I have only explained the difficult step in the proof.

Reiterating the theorem, if f is is partially differentiable with respect to its independent variables at a particular point, and all those partial derivatives are continuous, then f is differentiable at that point.

Lonely Runner Conjecture- II

The Lonely Runner conjecture states that each runner is lonely at some point in time. Let the speeds of the runners be \{a_1,a_2,\dots,a_k\}, and let us prove “loneliness” for the runner with speed a_e.

As we know, the distance between a_i and a_e is given by the formula |a_e-a_i|t for t\leq \frac{1}{2|a_e-a_i|} and 1-|a_e-a_i|t for t\geq \frac{1}{2|a_e-a_i|}, where t stands for time. Also, speed\times time=distance.

|a_e-a_i|t\geq \frac{1}{k} therefore t\geq \frac{1}{k|a_e-a_i|} when k>2. Also, 1-|a_e-a_i|t\geq \frac{1}{k} therefore t\leq \left (1-\frac{1}{k}\right )\left(\frac{1}{|a_e-a_i|}\right). Finally, observing that this is a periodic process with a period of \frac{1}{|a_e-a_i|}, we come upon the conclusion that a_e is lonely as compared to a_i when t\in \left[\frac{1}{k|a_e-a_i|}+\frac{n_i}{|a_e-a_i|}, \left (1-\frac{1}{k}\right )\left(\frac{1}{|a_e-a_i|}\right)\frac{n_i}{|a_e-a_i|}\right], where n_i\in\Bbb{N}.

Now we prove the loneliess of a_e with respect to every other runner. This is equivalent to the statement

\bigcap_{i\in A} \left[\frac{1}{k|a_e-a_i|}+\frac{n_i}{|a_e-a_i|}, \left (1-\frac{1}{k}\right )\left(\frac{1}{|a_e-a_i|}\right)\frac{n_i}{|a_e-a_i|}\right], where A=\{1,2,3\dots,k\}\setminus\{e\}. Also, note that all n_i can take different integral values.

Of ellipses, hyperbolae and mugging

For as long as I can remember, I have had unnatural inertia in studying coordinate geometry. It seemed to be a pursuit of rote learning and regurgitating requisite formulae, which is something I detested. My refusal to “mug up” formulae cost me heavily in my engineering entrance exams, and I was rather proud of myself for having stuck to my ideals in spite of not getting into the college of my dreams.

However, now I realise what useful entities ellipses and hyperbolae are in reality. Hence, as a symbolic gesture, I will derive the formulae of both the ellipse and the hyperbola in the most simple settings- that of the centre being at the origin (0,0).

1. Ellipse- The sum of distances from two _foci_ is constant. Let the sum be “L“. As the centre is at the origin, and we are free to take the foci along the x-axis, the coordinates of the foci are (-c,0) and (c,0). We thus have the equation \sqrt{(x-c)^2 +y^2}+\sqrt{(x+c)^2+ y^2}=L. On simpifying this, we get \frac{x^2}{a^2}+\frac{y^2}{b^2}=1, where a^2=\frac{L^2}{4} and b^2=\frac{(L^2-4c^2)}{4}.

2. In the case of a hyperbola, under similar conditions, we have the equation \sqrt{(x-c)^2 +y^2}-\sqrt{(x+c)^2+ y^2}=L. This under simplification gives \frac{x^2}{a^2}-\frac{y^2}{b^2}=1, where a^2=\frac{L^2}{4} and b^2=\frac{(4c^2-L^2)}{4}.  

The utility of trigonometrical substitutions

Today we will discuss the power of trigonometrical substitutions.

Let us take the expression \frac{\sum_{k=1}^{2499} \sqrt{10+\sqrt{50+\sqrt{k}}}}{\sum_{k=1}^{2499} \sqrt{10-\sqrt{50+\sqrt{k}}}}

This is a math competition problem. One solution proceeds this way: let p_k=\sqrt{50+\sqrt{k}}, q_k=\sqrt{50-\sqrt{k}}. Then as p_k^2+q_k^2=10^2, we can write p_k=10\cos x_k and q_k=10\sin x_k.

This is an elementary fact. But what is the reason for doing so?

Now we have a_k=\sqrt{10+\sqrt{50+\sqrt{k}}}=\sqrt{10+10\cos x_k}=\sqrt{20}\cos \frac{x_k}{2}. Similarly, b_k=\sqrt{10-\sqrt{50+\sqrt{k}}}=\sqrt{10-10\cos x_k}=\sqrt{20}\sin \frac{x_k}{2}. The rest of the solution can be seen here. It mainly uses identities of the form 2\sin A\cos B=(\sin A+\cos B)^2 to remove the root sign.

What if we did not use trigonometric substitutions? What is the utility of this method?

We will refer to this solution, and try to determine whether we’d have been able to solve the problem, using the same steps, but not using trigonometrical substitutions.

a_{2500-k}=\sqrt{10+\sqrt{50+\sqrt{2500-k}}}=\sqrt{10+\sqrt{50+\sqrt{(50+\sqrt{k})(50-\sqrt{k})}}}

=\sqrt{10+\sqrt{50+100\frac{\sqrt{50+\sqrt{k}}}{10}\frac{\sqrt{50-\sqrt{k}}}{10}}}=\sqrt{10+\sqrt{50}\sqrt{1+2\frac{\sqrt{50+\sqrt{k}}}{10}\frac{\sqrt{50-\sqrt{k}}}{10}}}
=\sqrt{10+10(\frac{\sqrt{50+\sqrt{k}}}{10\sqrt{2}}+\frac{\sqrt{50-\sqrt{k}}}{10\sqrt{2}})}=\sqrt{10+10\times 2\times\frac{\frac{1}{2}(\frac{\sqrt{50+\sqrt{k}}}{10\sqrt{2}}+\frac{\sqrt{50-\sqrt{k}}}{10\sqrt{2}})}{\sqrt{\frac{50+\sqrt{k}}{20\sqrt{2}}-\frac{50-\sqrt{k}}{20\sqrt{2}}+\frac{1}{2}}}\times \sqrt{\frac{50+\sqrt{k}}{20\sqrt{2}}-\frac{50-\sqrt{k}}{20\sqrt{2}}+\frac{1}{2}}}

As one might see here, our main aim is to remove the square root radicals, and forming squares becomes much easier when you have trigonometrical expressions. Every trigonometrical expression has a counterpart in a complex algebraic expression. It is only out of sheer habit that we’re more comfortable with trigonometrical expressions and their properties.

Let f:X\to Y be a mapping. We will prove that f^{-1}(Y-f(X-W))\subseteq W, with equality when f is injective. Note that f does not have to be closed, open, or even continuous for this to be true. It can be any mapping.

Let W\subseteq X. The mapping of W in Y is f(W). As for f(X-W), it may overlap with f(W), we the mapping be not be injective. Hence, Y-f(X-W)\subseteq f(W).

>Taking f^{-1} on both sides, we get f^{-1}(Y-f(X-W))\subseteq W.

How can we take the inverse on both sides and determine this fact? Is the reasoning valid? Yes. All the points in X that map to Y-f(X-W) also map to W. However, there may be some points in f^{-1}(W) that do not map to Y-f(X-W).

Are there other analogous points about mappings in general? In Y, select two sets A and B such that A\subseteq B. Then f^{-1}(A)\subseteq f^{-1}(B)

Axiom of Choice- a layman’s explanation.

Say you’re given the set \{1,2,3,\dots,n\}, and asked to choose a number. Any number. You may choose 1,2, or anything that you feel like from the set. Now suppose you’re given a set S, and you have absolutely no idea about what points S contains. In this case, you can’t visualize the points in S and pick any you feel like. You might say “pick the lowest upper bound of S“. However, what if S is not ordered? What if it does not even contain numbers? How do you select a point from a set when you can’t even describe any of the points in that set?

Here, the Axiom of Choice comes in. It states that if \mathfrak{B} is a set of disjoint subsets of S, then a function c: \mathfrak{B}\to \bigcup\limits_{B\in\mathfrak{B}}B such that one point from every disjoint set is selected. You can divide S into disjoint sets in any manner whatsoever, and get one point from each set. In fact, the disjoint sets don’t necessarily have to cover S. The condition is \bigcup\limits_{B\in\mathfrak{B}}\subseteq S.

Going by the above explanation, you may take each point in S to be a disjoint interval, and hence select the whole of S.

What is so special about this seemingly obvious remark? Selecting points from a set has always been defined by a knowledge of the set and its points. For example, if I say f:\Bbb{R}\to\Bbb{R} such that f(x)=f(x^2), I have selected the points with the information that points in \Bbb{R} can be squared, and lie inside \Bbb{R}, etc. If we had f:A\to \Bbb{R} where A is a set of teddy bears, then f(x)=x^2 would not be defined. With the Axiom of Choice, regardless of whether A contains real numbers or teddy bears, we can select a bunch of points from it.

What if B\in\mathfrak{B} are not all pair-wise disjoint? We can still select points from each B. Proof: Take the cross product \mathfrak{B}\times S. You will get points of the form (B,x), where B\in\mathfrak{B} and x\in S. Now create disjoint sets in \mathfrak{B}\times S by taking sets of the form B_i=\{(B_i,x)|x\in B\}; in essence you’re isolating individual sets of \mathfrak{B}. These are disjoint, as if x is the same, then B is different. The main purpose of taking this cross product was to create disjointed sets as compared to overlapping sets in S, so that we could apply the Axiom of Choice. Now we use the choice function to collect one point from each disjoint interval. Each of thee points will be of the form (B,x). Now we define a function g:\mathfrak{B}\times S\to S as g((B,x))=x. Hence, we have collected one point from each B\in\mathfrak{B}. Note that these points may well be overlapping. We found the cross-product just to be able to apply the Axiom of Choice. If the Axiom of Choice was defined for overlapping sets, then we wouldn’t have to find the cross product at all.

Now we come to the reason why this article was written: defining an injective function f:\Bbb{Z}_+\to B, where A is an infinite set and B is a subset of it. We don’t know if A is countable or not.

OK first I’d like to give you the following proof, and you should tell me the flaw in it. We use Proof by Induction. Say f(1)=a_1\in S. We find A-\{a_1\}, and determine a_2\in A-\{a_1\}. We assume f(a_n) equals a point in A-\{a_1,a_2,\dots,a_{n-1}\}, and then find f(a_{n+1}) in A-\{a_1,a_2,\dots,a_n\}. Why can we not do this? Because we know nothing about the points f(a_1),f(a_2),\dots! How can we possibly map 1 to any points without knowing what point we’re mapping it to?

The bad news is we might never know the properties of S. The good news is we can still work around it. Select \mathfrak{B} to be the set of all subsets of S. As one might know, there are \mathfrak{B}=2^{|S|}. The elements of \mathfrak{B} are not pairwise disjoint. However, we can still select a point from every set, as has been proven above (by taking the cross product to create disjoint sets, etc). The most brilliant part of the proof is this: take c(S). We know S\in \mathfrak{B}. This will give you one element from the whole of s. You need to know nothing about S to select an element from S. Let f(1)=c(S). Now take S-f(1). We know S-f(1)\in \mathfrak{B}. Let f(2)=c(S-f(1)). Continuing this pattern, we have f(n)=c(S-\bigcup\limits_{i}f(i)), where i\in \{1,2,3,\dots,n-1\}. By induction, we have that the whole of \Bbb{Z}_+ is mapped to a subset of S.