4 out of 5 dentists recommend this WordPress.com site

### (Part of) a proof of Sard’s Theorem

I have always wanted to prove Sard’s Theorem. Now I shall stumble my way into proving a deeply unsatisfying special case of it, after a whole day of dead ends and red herrings.

Consider first the special case of a smooth function $f:\Bbb{R}\to\Bbb{R}$. At first, I thought that the number of critical points of such a function have to be countable. Hence, the number of critical values should also be countable, which would make the measure of critical values $0$. However, our resident pathological example of the Cantor set makes things difficult. Turns out that not only can the critical *points* be uncountable, but also of non-zero measure (of course the canonical example of such a smooth function involves a modified Cantor’s set of non-zero measure). In fact, even the much humbler constant function sees its set of critical points having a positive measure of course. However, the set of critical *values* may still have measure $0$, and it indeed does.

For $f:\Bbb{R}\to\Bbb{R}$, consider the restriction of $f$ to $[a,b]\subset \Bbb{R}$. Note that the measure of critical points of $f$ in $[a,b]$ has to be finite (possibly $0$). Note that $f'(x)$ is bounded in $[a,b]$. Hence, at each critical *point* $p$ in $[a,b]$, given $\epsilon>0$, there exists a $\delta(\epsilon)>0$ such that if $m(N(p))<\delta(\epsilon)$, then $m(f(N(p)))<\epsilon$. This is just another way of saying that we can control the measure of the image.

Note that the reason why I am writing $\delta(\epsilon)$ is that I want to emphasize the behaviour of $\frac{\epsilon}{\delta(\epsilon)}$. As $p$ is a critical point, at this point $\lim\limits_{\epsilon\to 0}\frac{\epsilon}{\delta(\epsilon)}=0$. This comes from the very definition of the derivative of a function being $0$.

Divide the interval $[a,b]$ into cubes of length $<\delta(\epsilon)$. Retain only those cubes which contain at least one critical point, and discard the rest. Let the final remaining subset of $[a,b]$ be $A$. Then the measure of $f(A)\leq \text{number of cubes}\times\epsilon$. The number of cubes is $\frac{m(A)}{\delta(\epsilon)}$. Hence, $m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}$. Note that $f(A)$ contains all the critical values.

As $\epsilon\to 0$, we can repeat this whole process verbatim. Everything pretty much remains the same, except for the fact that $\frac{\epsilon}{\delta(\epsilon)}\to 0$. Hence, $m(f(A))\leq m(A)\frac{\epsilon}{\delta(\epsilon)}\to 0$. This proves that the set of critical values has measure $0$, when $f$ is restricted to $[a,b]$.

Now when we consider $f$ over the whole of $\Bbb{R}$, we can just subdivide it into $\cup [n,n+1]$, note that the set of critical values for all these intervals has measure $0$, and hence conclude that the set of critical values for $f$ over the whole of $\Bbb{R}$ also has measure $0$.

Note that for this can be generalized for any $f:\Bbb{R}^n\to \Bbb{R}^n$.

Also, the case for $f:\Bbb{R}^m\to \Bbb{R}^n$ where $m is trivial, as the image of $\Bbb{R}^m$ itself should have measure $0$.

### Furstenberg’s topological proof of the infinitude of primes

Furstenberg and Margulis won the Abel Prize today. In honor of this, I spent the better part of the evening trying to prove Furstenberg’s topological proof of the infinitude of primes. I was going down the wrong road at first, but then, after ample hints from Wikipedia and elsewhere, I was able to come up with Furstenberg’s original argument.

Furstenberg’s argument: Consider $\Bbb{N}$, and a topology on it in which the open sets are generated by $\{a+b\Bbb{N}\}$, where $a,b\in\Bbb{N}$. It is easy to see that such sets are also closed. Open sets, being the union of infinite generators, have to be infinite. However, if there are a finite number of primes $p_1,\dots,p_n$, then the open set $\Bbb{N}\setminus (\cup_i \{p_i\Bbb{N}\})=\{1\}$ is finite, which is a contradiction.

My original flawed proof: Let $\{a+b\Bbb{N}\}$ be connected sets in this topology. Then, as one can see clearly, $\Bbb{N}=\{2\Bbb{N}\}\cup\{1+2\Bbb{N}\}$; in other words, it is the union of two open disjoint sets. Therefore, it is not connected. If the number of primes is finite, then $\cap \{p_i\Bbb{N}\}=\{p_1p_2\dots p_n\Bbb{N}\}$, which is itself an open connected set. Hence, as all $\{p_i\Bbb{N}\}$ have a non-empty intersection which is open and connected, the union of all such open sets $\cup \{p_i\Bbb{N}\}$ must lie in a single component. This contradicts the fact that $\cup\{p_i\Bbb{N}\}=\Bbb{N}$.

This seemed too good to be true. Upon thinking further, we realize the fact that our original assumption was wrong. $\{a+b\Bbb{N}\}$ can never be a connected set, as it is itself made up of an infinite number of open sets. In fact, it can be written as a union of disjoint open sets in an infinite number of ways. This topology on $\Bbb{N}$ is bizarrely disconnected.

### Proving inequalities using convex functions

I have found that I am pretty bad at finding “clever factors” for Cauchy-Schwarz, whose bounds can be known from the given conditions. However, I am slowly getting comfortable with the idea of converting the expression into a convex function, and then using the Majorization Theorem.

(Turkey) Let $n\geq 2$, and $x_1,x_2,\dots,x_n$ positive reals such that $x_1^2+x_2^2+\dots+x_n^2=1$. Find the minimum value of $\sum\limits_{i=1}^n \frac{x_i^5}{\sum\limits_{j\neq i} x_j}$

My proof: We have $(x_1^2+1)+\dots+(x_n^2+1)=n+1$. Hence, using AM-GM inequality, we have $x_1+\dots+x_n\leq \frac{n+1}{2}$.

The expression we finally get is

$\sum\limits_{i=1}^n \frac{x_i^5}{\sum\limits_{j\neq i} x_j}\geq \sum\limits_{i=1}^n \frac{x_i^5}{\frac{n+1}{2}-x_i}$

Consider the function $f(x)=\frac{x^{5/2}}{\frac{n+1}{2}-\sqrt{x}}$. By differentiating twice, we know that for $x\leq 1$, this function is convex.

Hence, $f(x_1^2)+\dots+f(x_n^2)$ will be minimized only when $x_1^2=\dots=x_n^2$, which we know from the conditions given in the question, is $\frac{1}{n}$.

Note that

$f(x_1^2)+\dots+f(x_n^2)= \sum\limits_{i=1}^n \frac{x_i^5}{\frac{n+1}{2}-x_i}$

Hence, the minimum is attained when $x_1=\dots=x_n=\frac{1}{\sqrt{n}}$, and is equal to $\frac{1}{n(n-1)}$, which is found by substituting $x_1=\frac{1}{\sqrt{n}}$ in the original expression.

### A beautiful generalization of the Nesbitt Inequality

I want to discuss a beautiful inequality, that is a generalization of the famous Nesbitt inequality:

(Romanian TST) For positive $a,b,x,y,$, prove that $\frac{x}{ay+bz}+\frac{y}{az+bx}+\frac{z}{ax+by}\geq \frac{3}{a+b}$

Clearly, if $a=b$, then we get Nesbitt’s inequality, which states that

$\frac{x}{y+z}+\frac{y}{z+x}+\frac{z}{x+y}\geq \frac{3}{2}$.

This is question 14 on Mildorf’s “Olympiad Inequalities”, and its solution comprises finding a factor to multiply this expression with, almost out of thin air, and then use Cauchy Schwarz and AM-GM inequalities to prove the assertion. My solution is the following:

On interchanging $a$ and $b$, the right hand side remains the same. However, the left hand side becomes

$\frac{x}{az+by}+\frac{y}{ax+bz}+\frac{z}{ay+bx}\geq \frac{3}{a+b}$

On adding these two inequalities, we get

$\frac{x}{ay+bz}+\frac{x}{az+by}+\frac{y}{az+bx}+\frac{y}{ax+bz}+\frac{z}{ax+by}+\frac{z}{ay+bx}\geq \frac{6}{a+b}$

Multiplying both sides by $\frac{1}{2}(a+b)$ and then adding $6$ on both sides, we get

$2(x+y+z)(\frac{1}{x+y}+\frac{1}{y+z}+\frac{1}{z+x})\geq 9$

This is obviously true by Cauchy Schwarz. We will explain below how we got this expression.

Let us see what happens to $\frac{x}{ay+bz}+\frac{x}{az+by}$ in some detail. After multiplying by $\frac{1}{2}(a+b)$ and adding $2$, we get

$\frac{\frac{1}{2}(a+b)x+ay+bz}{ay+bz}+\frac{\frac{1}{2}(a+b)x+az+by}{az+by}\geq 2\frac{(a+b)(x+y+z)}{(a+b)(y+z)}=2\frac{(x+y+z)}{y+z}$.

EDIT: I assumed that this was obviously true. However, it is slightly non-trivial that this is true. For $\frac{a}{b}+\frac{c}{d}\geq 2\frac{a+c}{b+d}$, the condition that should be true is that $(b-d)(bc-ad)\geq 0$. This is true in our case above.

After adding the other terms also, we get

$2(x+y+z)(\frac{1}{x+y}+\frac{1}{y+z}+\frac{1}{z+x})$

As pointed above, this is clearly $\geq 9$ by Cauchy-Schwarz.

Hence proved

Note: For the sticklers saying this isn’t a rigorous proof, a rigorous proof would entail us assuming that

$\frac{x}{ay+bz}+\frac{y}{az+bx}+\frac{z}{ax+by}< \frac{3}{a+b}$, and then deriving a contradiction by proving that

$2(x+y+z)(\frac{1}{x+y}+\frac{1}{y+z}+\frac{1}{z+x})< 9$, which is obviously false

### A small note on re-defining variables to prove inequalities

I just want to record my solution to the following problem, as it is different from the one given online.

For $a,b,c,d$ positive real numbers, prove that $\frac{1}{a}+\frac{1}{b}+\frac{4}{c}+\frac{16}{d}\geq \frac{64}{a+b+c+d}$

This has a fairly straight forward solution using Cauchy-Schwarz inequality, which for some reason I did not think of.

The way that I solved it is that I re-defined the variables: let $a=8a', b=8b', c=16c'$ and $d=32 d'$. Then this is equivalent to proving that $\frac{1}{8}\frac{1}{a'}+\frac{1}{8}\frac{1}{b'}+\frac{1}{4}\frac{1}{c'}+\frac{1}{2}\frac{1}{d'}\geq \frac{1}{\frac{a'}{8}+\frac{b'}{8}+\frac{c'}{4}+\frac{d'}{2}}$

This is easily seen to be a consequence of Jensen’s inequality, as $\frac{1}{x}$ is a convex function for positive $x$.

### A proof of Muirhead’s Inequality

I’ve been reading Thomas Mildorf’s Olympiad Inequalities, and trying to prove the 12 Theorems stated at the beginning. I’m recording my proof of Muirhead’s Inequality below. Although it is probably known to people working in this area, I could not find it on the internet.

Muirhead’s Inequality states the following: if the sequence $a_1,\dots,a_n$ majorizes the sequence $b_1,\dots,b_n$, then for nonnegative numbers $x_1,\dots,x_n$, we have the following inequality

$\sum\limits_{\text{sym}} x_1^{a_1}x_2^{a_2}\dots x_n^{a_n}\geq \sum\limits_{\text{sym}}x_1^{b_1}x_2^{b_2}\dots x_n^{b_n}$

We will derive it as an easy consequence of the fact that for a convex function $f$, if the sequence $\{a_i\}$ majorizes the sequence $\{b_j\}$, then $f(a_1)+\dots f(a_n)\geq f(b_1)+\dots+f(b_n)$. This is theorem 9 in Mildorf’s document (the Majorization Theorem).

We will prove the Muirhead Inequality by induction on the number of $x_i$‘s. For just one such $x_i$, this is true by the Majorization Inequality. Now assume that it is true for $(n-1)$ number of $x_i$‘s. Then the statement of Muirhead’s Inequality is equivalent to the statement

$(x_1^{a_1}+x_1^{a_2}+\dots x_1^{a_n})(x_2^{a_1}+x_2^{a_2}+\dots x_2^{a_n})\dots (x_n^{a_1}+x_n^{a_2}+\dots x_n^{a_n})\geq (x_1^{b_1}+x_1^{b_2}+\dots x_1^{b_n})(x_2^{b_1}+x_2^{b_2}+\dots x_2^{b_n})\dots (x_n^{b_1}+x_n^{b_2}+\dots x_n^{b_n})$

The above is true because $f_i(y)=x_i^y$ is a convex function, as it is an exponential function. Hence, $f_i(a_1)+\dots f_i(a_n)\geq f_i(a_1)+\dots+f_i(a_n)$. Therefore, $x_i^{a_1}+\dots x_i^{a_n}\geq x_i^{b_1}+\dots x_i^{b_n}$.

It follows that

$(x_1^{a_1}+x_1^{a_2}+\dots x_1^{a_n})(x_2^{a_1}+x_2^{a_2}+\dots x_2^{a_n})\dots (x_n^{a_1}+x_n^{a_2}+\dots x_n^{a_n})\geq (x_1^{b_1}+x_1^{b_2}+\dots x_1^{b_n})(x_2^{b_1}+x_2^{b_2}+\dots x_2^{b_n})\dots (x_n^{b_1}+x_n^{b_2}+\dots x_n^{b_n})$

A note on the induction: Consider the base case, where we just have $x_1$. Then the inequality is equivalent to proving that

$x_1^{a_1}+x_1^{a_2}+\dots+x_1^{a_n}\geq x_1^{b_1}+x_1^{b_2}+\dots+x_1^{b_n}$, which is true by Muirhead inequality.

Now if we have two $x_i$‘s, say $x_1$ and $x_2$, then this is equivalent to the inequality $(x_1^{a_1}+\dots+x_1^{a_n})(x_2^{a_1}+\dots+x_2^{a_n})\geq (x_1^{b_1}+\dots+x_1^{b_n})(x_2^{b_1}+\dots+x_2^{b_n})$.

We get the above inequality + terms of the form $(xy)^{a_1}+\dots+(xy)^{a_n}$, which we know from the Majorization Inequality is $\geq (xy)^{b_1}+\dots+(xy)^{b_n}$

The induction will work similarly for higher numbers of $x_i$‘s.

### A simpler way to obtain smooth functions than convolutions?

Of the many mathematical concepts that I don’t understand, one of the more important ones is the convolution of functions. It is defined in the following way:

$(f*g)(x)=\int_{-\infty}^{\infty} f(x-y)g(y) dy$

Our guiding principle should be that we want to make $*"$ an abelian group action (although inverses are not always present, at least when talking about integrable functions).

However, perhaps the reason why we thought of this action in the first place was that we wanted smooth functions out of just integrable functions. For instance, given any integrable function $f$, if $\phi(x)$ is a smooth compactly supported function, $(\phi*f)(x)=\int_{-\infty}^{\infty}\phi(x-y)f(y)dy$ will be smooth (provided we can bring the derivatives under the integral sign, which is related to the Dominated Convergence Theorem).

However, why have this complicated definition? Why not just consider the function $g(x)=\int_{-\infty}^{\infty}\phi(x)f(y)dy$? Clearly, if $\phi(x)$ is smooth, so is this one.

One of the reasons why we perhaps want the more complicated definition of $(f*g)(x)=\int_{-\infty}^{\infty} f(x-y)g(y) dy$ is that there does not exist a function $\phi(x)$ such that $\int_{-\infty}^{\infty} \phi(x)f(y)dy=f(x)$ for all functions $f(x)$. Hence, there cannot exist an identity element for the set of integrable functions on the real line. Also, this definition is clearly not commutative. I’d be interested in knowing your thoughts about what other purposes convolution serves, that this simple definition does not.

### A more intuitive way of constructing bump functions

This is a short note on creating bump functions, test functions which are $1$ on the desired domain, etc. I will be working in one dimension. However, all these results can be generalized to higher dimensions by using polar coordinates.

As we know, the function $f(x)= e^{-\frac{1}{x}}$ for $x\geq 0$ and $0$ for $x\leq 0$ is a smooth function. Hence, it is an ideal candidate for constructing smooth, compactly supported functions. If we wanted to construct a smooth function that was supported on $[a,b]$, then $f(x-a)f(-x-b)$ is one such function.

However, the main difficulty is in constructing a bump function of the desired shape. How do we construct a bump function that is $\equiv 1$ on $[c,d]\subset [a,b]$? The idea that I had, which is different from the literature that I’ve consulted (including Lee’s “Smooth Manifolds”), is that we could consider the integrals of functions.

Consider $\int_0^{\infty} f(x-a)f(-x-c)-f(x-d)f(-x-b) dx$.

Basically, it we are integrating a function that is positive on $[a,c]$, and then adding that to the integral of the negative of the same function, but now supported on $[d,b]$.

This function will be constant on $[c,d]$, and then decrease to $0$ on $[d,b]$. On re-scaling (multiplying by a constant), we can obtain a bump function on $[a,b]$ that is $1$ on $[c,d]$.

### A derivation of the the Taylor expansion formula

I have tried, for long, to prove Taylor’s Theorem on my own. The only way that this proof is different from the hundreds of proofs online is that I have written $\int_0^a f'(x)dx$ as $\int_0^a f'(a-x)dx$. This solves a lot of the problems I was facing in developing the Taylor expansion.

Let $f\in C^{k+1}[0,a]$. Then we have

$f(a)-f(0)=\int_0^a{f'(x)dx}=\int_0^a{f'(a-x)dx}=xf'(a-x)|_0^a+\int_0^axf''(a-x)dx$.

We can easily calculate that $xf'(a-x)|_0^a=af'(0)$. Also,

$\int_0^axf''(a-x)dx=\frac{x^2}{2}f''(a-x)|_0^a+\int_0^a \frac{x^2}{2}f'''(a-x)dx$.

Clearly, $\frac{x^2}{2}f''(a-x)|_0^a=\frac{a^2}{2}f''(0)$.

Continuing in this fashion, we get that

$f(a)-f(0)=af'(0)+\dots+\frac{a^n}{n!}f^{(n)}(0)+\dots+\int_0^a\frac{x^k}{k!}f^{(k+1)}(x)dx$

Assuming that $f$ is smooth, if as $k\to\infty$, $\int_0^a \frac{x^k}{k!}f^{(k+1)}(x)dx\to 0$ then we can recover the standard Taylor expansion formula.

### IMO 2011, Problem 3

IMO Problem 3: Let a function $f(x):\Bbb{R}\to \Bbb{R}$ satisfy the relation $f(x+y)\leq -yf(x)+f(f(x))$. Prove that for $x\leq 0$, $f(x)=0$.

Let $x+y=f(x)$. Then we have $f(f(x))\leq f(x)(x-f(x))+f(f(x))$. Cancelling $f(f(x))$ on both sides, we get $f(x)(x-f(x))\geq 0$.

For $x\leq 0$, if $f(x)>0$, then the above gives a contradiction, as $f(x)>0$ but $x-f(x)<0$.

Now we see that $f(0)=0$. This is because from the above equation, we have $f(0)(-f(0))\geq 0$. This is possible only if $f(0)=0$.

Hence so far, we have that for $x<0$, $f(x)\leq 0$. Now we shall prove that for all $y\in\Bbb{R}$, we have $f(y)\geq 0$. This will give us the desired contradiction.

We have $yf(x)\geq f(f(x))-f(x+y)$. Let $x=0$. Then we have $-f(y)\leq 0$, or $f(y)\geq 0$. This, coupled with the fact that for $y<0$ we have $f(y)\leq 0$, we get that for all $y\leq 0$, we have $f(y)=0$.