Fruits of procrastination

Origins of Astrology in India

The paper that I will be discussing today is On the Philosophical and Cosmological Foundations of Indian Astrology, by Audrius Beinorius. I’ve always been fascinated with Astrology, and am glad that I got the time and opportunity to study it. This article is not paywalled.

The Bible says that God created man in his image. You might not be surprised to learn that this is not a belief that is original to the Bible, but has been influencing the development of humanity for thousands of years. This belief was called sādarśya in Ancient India, although perhaps a human-like God could be replaced by the heavens in the Indian context. I will be discussing this, and more below.

Jyotiśāstra

The study of Astrology is called Jyotiśāstra, or the study of lights, which I think is an amazing name for the study of anything. However, was it really mentioned in the ancient Hindu texts, or was it a later invention that was falsely attributed to ancient Hindu texts (like Vedic Maths)? Jyotiśāstra formed an integral part of the Rig Veda, Yajurveda and Atharvaveda. Jyotisa Vedanga was known as one of the six auxiliary sciences (āngas) of the Vedas. Hence, astrology was one of the most integral parts of science and society in Ancient India.

Although attributed to the Indian sage Lagadha around 400 BC, the Jyotisa Vedanga are claimed to have been revealed by Brahma (prathama minu) himself to eighteen mythological sages. According to the Mātsyapurana, the authority of a Rishi or a seer is based solely on his understanding of celestial movements, and that some of them become stars themselves when they die, according to their Karma. The main purpose that the Jyotisa Vedanga served was to inform people about the correct time of sacrifice, and not necessarily to predict the future.

The Vedas arose for the purpose of use in sacrifices; sacrifices are enjoined according to the order of times; therefore, he who knows Jyotiṣa knows sacrifices. Just as a tuft of hair stands on the head of peacocks or a jewel in the heads of cobras, so astro- nomical calculations (gaṇita) stand at the head of all the sciences that are spoken of as vedāṅga

Ṛgveda Jyotiṣa, 35 Yajurveda Jyotiṣa

According to the sage Panini, the science of the movements of heavenly bodies formed “the eye of the Vedas”. Hence, Astrology and Astronomy formed perhaps the most important sciences according to ancient Indian texts.

Space and Time

The ancients asked the same questions that high powered Physicists are still asking: what is time, and how is it created? The Bhagvata Purana, one of the holiest Indian texts that is read on a regular basis in many Indian households, says that time is the same as God, and is the primary catalyst that causes the cosmos to unfold “with its infinite potential for permutation”. In other words, time causes stars and planets to move in their eternal orbits.

On the other hand, ancient Indian astronomers thought that it was the motion of planets that “created” time, and not the other way round.

Further- more, the knowers of time regard time as the motion of the sun, planets, and constellations, distinguished by their different revolutions.

Vākyapadīya

The time that the motion of planets created was continuous and regular, and humans artificially divided time into seconds, hours and days. Moreover, time was not just an instrument of measurement in Ancient India. It possessed a certain moral characteristic. Your karma from your previous incarnations required you to perform your dharma in your current lifetime in order to atone for your sins. But how do you know what was the right time and opportunity to perform your dharma? By studying the movements of planets. If you understood Astrology well enough, the positions of planets in certain constellations would tell you what you should do and when. The eternal motions of planets not only symbolize eternal order or sanatana dharma, but also individual order or svadharma, so that humans would know what to do in order to maintain the eternal harmony of the universe.

In the image of the heavens

But what about people who are not astrologers, and cannot study the heavens? Are they not affected by planetary motion? They indeed are, because a human was thought to be the embodiment of the whole cosmos.

A belief that permeated the whole of the ancient world (and Europe until the 16th century) was that things that resembled each other in shape, were of the “same kind”. Hence, when Sphujidhvaja asserted in 270 AD that “mutual interactions (yoga) of the planets as they pass through the various signs of the Zodiac are said to be formed in the likeness of shapes”, it was concluded that objects that resembled those same shapes would now be affected by the motion of the planets. Convenient.

The early Taittīriya brāhmaṇa (commentary on the Vedas in the 6th century BC) already had suggested: “The constellations are images of the world”

Because the world and constellations are imaged of each other, whatever happened in the world would also happen in the heavens (and vice-vera). However, the world was too complicated to understand, with lots of humans swarming around killing, looting and the like. Hence, people could look up to the heavens to understand what was happening, or going to happen in the world. As a lot of constellations resemble human shapes (the Hunter, for instance), the stars clearly mirror human activities. But astrologers took things one step further:

The Bṛhatsaṃhitā (14.1-5) then speaks of the nakṣatra-puruṣa (“man of the constella- tions”), whose body is formed from the twenty-seven lunar mansions. The Bṛhajjātaka of Varāhamihira (1.4) describes the kālapuruṣa (“man of time”), whose body is composed of the twelve zodiacal signs, beginning with Aries and ending with Pisces, his head and feet, respectively.

So it was not just that stars could tell man’s future. Man was a microcosm of the whole universe, with all the constellations contained inside him. Hence, when planets passed through certain constellations, one may imagine miniature planets passing through the same constellation inside the human body, mirroring the motion of the planets in the skies. This affected the good or evil inside of him, and also his fate. This can be taken as an example of taking an analogy, and pushing it to its logical extreme. It wasn’t that we could just understand our fate by looking at the skies, because we were the created in the their image. We could understand our fates by looking at the skies because we contain the skies within us.

While researching for this piece, I read some parts of another paper by David Pingree, which stated that Indian Astronomers originally stated that there were 28 constellations. However, under Babylonian influence around 500 BC, it corrected that to 12. Along parallel lines, Indian astrology originally concerned itself only with knowing the right time for animal (and other) sacrifices. It was only later, with the advent of the horoscopic kind of astronomy with its 12 constellations, that Indian astrology concerned itself with predicting the future. This probably gave a lot of clout to seers and sages in the courts of kings.

Conclusion

Both the author of the paper and I had similar feelings while reading up on this topic: although it’s easy to dismiss Astrology as pseudo-science, it forms a window with which to view the ancient world. In fact, things like divination are almost axiomatic. If you observe A, then you conclude B. If I see that mercury is in Aries and Jupiter is in Taurus, I have to conclude that something good is going to happen. The only fallacy perhaps is to conclude that the motions of planets have anything to do with human fate. However, we can hardly fault the ancients for a bit of imagination, seeing as the motion of some heavenly bodies like the Sun and the moon did actively determine their lives and fates.

References

  1. On the Philosophical and Cosmological Foundations of Indian Astrology, by Audrius Beinorius
  2. Astronomy and Astrology in India and Iran, by David Pingree

Deep Learning through bad analogies

The paper that I will be reviewing today is Deep Learning by YeCun, Bengio and Hilton. It was published in Nature in 2015, and is one of the most highly cited papers in the field with 16,750 citations. The authors of the paper went on to win the Turing Award in 2018.

I did not know deep learning and machine learning before reading this article. Hence, a lot of my experience of reading the paper involved reading a paragraph, not knowing what the heck was going on, and then watching youtube videos to break down what was happening. I will be linking those Youtube videos as I go along. Needless to say, I am amazed at the quality of the content related to this field that is available online.

What does deep learning do? Deep learning helps us classify data. Looking at a picture of a dog, it should be able to tell you that it is a dog, and not a pangolin (which might be useful information in these times). But how does it do that? It does this through a neural network, which is essentially a series of layers of “calculators”. Suppose we take a number which has 4 digits. This can obviously be encoded in a 4-tuple. Now we want to know if that number is divisible by 37 or not (let us imagine a world in which we care whether numbers are divisible by 37). What the layers of “calculators” or neural units, as they’re generally called, do is that they take in that 4-tuple, and spit out another 4 tuple. At the end, we will again have a 4 tuple which will look nothing like the original tuple. However, we might have a test saying that if the first component of this final tuple is >0.9, then this number has at least a 95% chance of being divisible by 37. This is only a simplistic example (which might have been better performed on an actual calculator), but what neural networks do is that they give us the probability that a certain input lies inside a certain class of objects: like the probability of the 4 digit number that was input to lie inside the class of numbers that are divisible by 37.

Supervised Learning and Back Propagation

How do we train a neural network to give us the right outputs? The same way you train a kid to behave around guests. When they scream at guests and break the china, you send them outside and ask them to re-evaluate their behavior. You keep doing that until they start behaving themselves around guests.

A neural network has been displayed below:

Each edge corresponds to certain weights, or punishment techniques that will help kids learn how to behave around strangers. You may for instance try a variety of punishments: offer them ice cream, ask them to go to their room, threaten to take away their toys, or tell them that they will only be allowed to eat in yellow bowls instead of white from now on. After trying combinations of all these punishment techniques, you find out the following: you should take away ice cream instead of offering more (negative gradient for this weight), children’s behavior improves if you ask them togo to their room, their behavior improves much faster if you threaten to take away their toys, and that they really don’t care if they eat in white bowls or yellow bowls. From this analysis, you conclude that you will be best served if you do a combination of threatening to take away their toys, and taking away their ice cream.

Similarly, neural networks might take in the picture of a dog, and tell you that it’s a cat. You’ll then have to adjust the weights associated with the edges such that this error is corrected. Because millions of inputs are used to train neural networks, some inputs might ask us to increase certain weights, whilst others might require us to decrease them. Hence, we take an average desired change in weights over all inputs, and then change weights accordingly so as to decrease the error “on average”.

What is back propagation? It is a fancy way of saying that we correct the values of the weights backwards. Refer to the image of the neural network given above. First we change the weights of the edges connecting the units in Hidden Units H2 to the output layer in the diagram given above, so as to minimize error. Then, using those new values, we change the weights of the edges connecting Hidden Units H1 to Hidden Units H2, in order to minimize the error further, and so on. An amazing video that explains this much better than I ever could is this.

Unsupervised Training

But what if you don’t have the time to train your child? Would it still be likely that the child behaves well in front of unknown guests. A kid will not necessarily know good from bad. However, a sufficiently observant child might know that screaming and breaking china evoke certain kinds of reactions from parents and guests, and that laughing and playing with guests evoke other kinds of reactions. This is similar to the clustering algorithm used in unsupervised training. If you feed new raw data without labels into an unsupervised neural network, it may not know the *names* of the classes that that data lies in, but it can plot that data, and determine that the data can be subdivided into certain classes. Look at the illustration below for an example of clustering:

Another feature of unsupervised learning is that instead of classification, it can encode images, and then decode images. What this does is that it removes noise, and preserves only the important details of the input. An illustration is given below:

Much like an unsupervised child, and unsupervised neural network can do a lot of things without being trained by its parents. All the above illustrations have been taken from this amazing video.

Convolutional Neural Networks

Suppose that in a more fun parallel universe, you’re asked to locate a 1 ft tall Mickey Mouse statue in a dimly lit room. How would you do that? One way is the following: you look around the room to see if you can spot objects that are approximately 1 ft tall. Objects that are too big or too small are automatically eliminated. Then amongst the more or less 1 ft tall objects, you should look for objects that that have two round ears sticking out. You must have eliminated everything else apart from, possibly, Minnie Mouse. Now you can perhaps concentrate harder on facial features to conclusively locate the Mickey Mouse statue.

Similarly, convolutional networks are neural networks that try to detect features of an input image step by step, in order to determine what it is an image of. In the first hidden unit layer, the convolutional layer might detect where in the image one may find circles. This information is now input into the second hidden layer, which might detect whiskers, or spots, or perhaps stripes. At the end of this process, perhaps after being processed through thousands of layers of hidden units, the neural network might be able to tell us what the image is of.

The detection of edges etc happens by breaking the input image into pixels, and constructing a very large matrix containing information of those pixels. Say this pixel matrix contains a 1 whenever the color black is present in a pixel, and 0 whenever white is present. Gray would perhaps correspond to 0.5. Now construct a much smaller matrix, say a matrix of the form \begin{pmatrix}1&1\\0&0 \end{pmatrix}.

This matrix corresponds to two (short) rows of pixels, the top row containing the color black, and the bottom row white. Now we take this small matrix, and multiply it with all the 2\times 2 sub-matrices of the bigger pixel matrix. This process will help us detect which parts of the image look like one row of black on top of one row of white, which helps us find white edges on a black background. All this has been amazingly explained in this video.

Natural Language Processing

When we type “neural” into the Google Search Bar, how does Google know that we mean “neural networks” (*meta*). This is because the neural network responsible for making these predictions has been trained on billions of pages of text, which help it “understand” which words make meaningful phrases. I will mostly be referring to this video on natural language processing.

Let us suppose that each word in the English language has been assigned a 50-tuple, in which each entry of the tuple has some number. For instance, it is possible that the word “neural” corresponds to (1,1,1,0,0,\dots,0). Now suppose we decide to train the neural network to understand English phrases that are three words long. We feed billions of pages of text into it, and train it to give us a positive score whenever the phrase is meaningful, like “neural networks rock”, and give a negative score whenever the phrase is not meaningful, like “Amit Shah rocks”.

But how does a neural network mathematically process a phrase? It takes the three 50-tuples corresponding to each word, glues them together, and now makes a 150-tuple. Assuming that the weights of all the edges in this neural network remain constant, we’re going to change the entries in the 50-tuple of each word such that when the neural network performs mathematical manipulations on the 150-tuples, we get a positive answer for meaningful phrases, and a negative answer for nonsense phrases. So after all this training is done, the words “Amit” might correspond to (1,0,\dots,0), “Shah” might correspond to (0,1,\dots,0), and “rocks” might correspond to (-1,-1,-1,\dots,-1). Hence, when the 150-tuple (1,0,\dots,0,1,\dots,0,-1,-1,\dots,-1) is plugged into the neural network, we get a negative answer.

But then how does word prediction work? When we type “Amit Shah” into the google search bar, the neural network tries to fit all words into the phrase that give the three word phrase a positive score. Then it calculates the probability of each of those words completing that phrase, probably depending on the frequency of use in the training data, and offers the words with highest probabilities as predictions just below the search bar.

Recurrent Neural Networks

Let us now talk about speech detection. Suppose you say the phrase “Black Lives Matter” (in order to train your neural network to do right by society). Each word that you say has to be separately interpreted by the neural network. This is a task that a recurrent neural network can perform: it takes as input each word that you say, and assigns probabilities to the various options corresponding to the input, stores it, and then takes the next input.

For instance, when you say “Black”, the neural network assigns a 99% probability that you said black, and a 1% probability that you said blot, based on the clarity of your speech. This gets stored (in Long Short Term Memory (LSTM) units). Now let us assume that when you say “Lives”, your speech is not clear. Hence, the neural network assigns a 40% probability to that word being Lives, and a 60% probability to that word being Pies. Now when you say “Matter”, your speech is much clearer, and the neural network is 99% sure that you said “Matter”. Now it tries to process the two phrases “Black Lives Matter” and “Black Pies Matter”. The neural network will give a negative score for the latter phrase as it has never seen it before, or perhaps has been negatively trained on it, and a positive score for the former. Hence, the neural network ultimately detects “Black Lives Matter”. We hope that it says an emphatic “Yes” in response.

Future directions

The authors opine that much of the future of deep learning lies in further developing unsupervised learning, and basing classification algorithms on human vision, which focuses only a small part of an object, and blurs everything else. The authors conclude that combining representation learning with complex reasoning is what will be most useful in the long run.

Thanks for reading!

References

  1. https://www.nature.com/articles/nature14539#citeas
  2. https://www.youtube.com/watch?v=lEfrr0Yr684
  3. https://www.youtube.com/watch?v=YRhxdVk_sIs
  4. https://www.youtube.com/watch?v=V8qrVleGY5U

Of mountain passes and migration in the olden days

The paper that I read today is Why Mountain Passes are Higher in the Tropics by Daniel H. Janzen. I found this paper by browsing the paper “100 articles every ecologist should read”, which was published in Nature in 2017.

What were the obstructions to the migration of species of animals in the ancient world, and what made species in temperate regions more suited to migration than those in the tropical regions?

One might imagine that tall mountains would be an obstruction to migration. High mountains would prevent people living in the valley from travelling through the impossible mountain passes to other parts of the world. However, they were less of an obstruction to species’ migration in temperate regions of the world, as compared to tropical regions. Why was that?

One of the factors that could affect migration is this: if the climate in the valley and in the high mountain passes were comparable, then species, that were already thriving in the valley, could easily travel through the mountain passes and spread. Maybe the weather conditions of mountain passes and valleys were much more similar in temperate regions as compared to the tropics? That can explain the fact that species migrated much more in temperate regions than in tropical regions? The authors, after a statistical study, conclude the following:

The climatic differences between the mountain passes and the valley in tropical and temperate regions were comparable. Hence, this wouldn’t be a deciding factor in migration.

What was the deciding factor in migration, however, was that temperatures varied a lot more annually in the temperate regions than in the tropics. Anyone living on the East Coast can attest to that. Hence, by natural selection, the species in temperate regions were a lot more resistant to wildly varying temperatures than their tropical counterparts, who were used to similar temperatures throughout the year.

These toughened and highly resistant species would then cross mountain passes, and spread out over a much larger area than species in the tropics.

This also explains another interesting observation: there is a much larger variety of species found in the tropics than in temperate regions. This is because of the following reason: when a few unusually resistant individuals of a tropical species were able to cross mountain passes and travel to other regions, they would not be followed by their less resistant counterparts. Hence, they could go to new regions and populate them with their evolutionarily advantaged genes, without any mixing and interference from their unfit counterparts still stuck in their old habitat.

Hence, mountain passes are not always higher in the tropics. The ability to cross these mountain passes were found to be lacking, however.

<Insert moral about climbing mountains or something similar>

References:

  1. https://www.wsl.ch/lud/biodiversity_events/papers/niklaus.zimmermann@wsl.ch_2017_12_12AmNat_Janzen_1967.pdf
  2. https://www.nature.com/articles/s41559-017-0370-9

Capacitors and making complicated shapes simpler

The paper that I intend to blog about is: Parallel-Plate Capacitor by the Schwartz-Christoffel Transformation by Harlan B. Palmer.

I intend to read some research papers in various academic fields, and, perhaps amateurishly, blog about them. We could all be better about what is happening outside of our narrow fields of interest. Although I hope to be regular about this, it is likely that this is going to be a sporadic bunch of posts until my employer asks that I devote more time to my day job.

This is the paper by Harlan B. Palmer that I am referring to.

How do we measure the electrical “energy” (electrostatic potential energy) or electrical field strength of a capacitor? If we can somehow measure the flux density between the plates of the capacitor, we will have the answer. However, in such a calculation, we only consider the sides of the plates facing each other. We don’t consider the back parts of these plates. These also have charge, and hence affect the electrical field strength. The author contends that the true value of the capacitance, which depends on the electrostatic potential, might be “several thousand percent greater than indicated”.

Another complicating factor is “fringing”. Near the ends of the plates, the flux lines veer towards the center of the opposite plates, and don’t go straight up. This makes the flux lines more difficult to deal with mathematically.

Consider the figure given below:

We take a polygon containing the front and back sides of both plates. So ideally, if we can integrate the flux attached to all sides of the plates, we should be able to calculate capacitance. However, the flux lines are all twisty and non-uniform, the sides of the polygon seem complicated and difficult to deal with, and the whole enterprise looks very challenging.

What if we could “straighten” this messy polygon into a rectangle, such that all the flux lines pass through the sides of this rectangle uniformly? That would make life simple! This miraculous transformation is achieved by the Schwartz-Christoffel transformation.

The Schwartz-Christoffel Transformation takes the figure given below, and maps it to any polygon that is desired:

Moreover, this transformation is reversible. Hence, effectively we can map any polygon on the complex plane to any other polygon on the complex plane, and that too conformally. What this means is that we can preserve the Physics principles. The lines of electric flux should always be perpendicular to equipotential lines, and because this is the case in the complicated polygon, this will remain the case even in the simpler rectangle.

The Schwartz-Christoffel transformation looks like this:

Z=\int (\zeta-\zeta_a)^{m_a}(\zeta-\zeta_b)^{m_b}\dots (\zeta-\zeta_n)^{m_n}d\zeta

Here, the m_i‘s are telling Z what angle to turn by, in order to form the desired polygon. Also, we can determine the lengths of the sides of the polygons such that the flux lines attached to the plates indeed have uniform density.

What I think has happened is this. Look at the rectangle below:

If you take the top plate of the capacitor, peel out the front and back sides of this plate and lay them side by side, you’ll get the top side of this rectangle. Similarly, if you take the front and back sides of the lower plate of a capacitor and law them out side by side, you’ll get the bottom plate of this rectangle. Moreover, there is no “fringing” as you may imagine that the thin polygon in Figure 1b has been straightened out. Hence, the flux lines are now uniform. This will now lead to easy calculations without error.

References

  1. Parallel-Plate Capacitor by the Schwartz-Christoffel Transformation, by Harlan B. Palmer

Some problems from Indian National Mathematics Olympiad, 2014

I just want to add a couple of problems from INMO 2014 that I solved this morning. The first problem was slightly less tricky, and just involved pairing divisors with each other in the most obvious way. However, the second problem is quite devious, and is more of an existence proof than writing down an explicit winning strategy.

  • Let n be a natural number. Prove that \left\lfloor \frac{n}{1} \right\rfloor+\left\lfloor \frac{n}{2} \right\rfloor+\dots+\left\lfloor \frac{n}{n} \right\rfloor+\left\lfloor \sqrt{n} \right\rfloor is even.

We will prove this by induction. Let f(n)=\left\lfloor \frac{n}{1} \right\rfloor+\left\lfloor \frac{n}{2} \right\rfloor+\dots+\left\lfloor \frac{n}{n} \right\rfloor+\left\lfloor \sqrt{n} \right\rfloor. Note that f(1)=2 is even. Also, note that f(i)=f(i+1)+ the number of divisors of i+1. It is precisely at those divisors that 1 is added to f(i+1). If i+1 is not a square, then the number of divisors can be paired up as (d,\frac{i+1}{d}). Hence, the number of divisors should be even. Moreover, \lfloor \sqrt{i}\rfloor=\lfloor \sqrt{i+1}\rfloor. However, if i+1 is indeed a square, then \sqrt{i+1} is a divisor that cannot be paired up with a different divisor as \frac{i+1}{\sqrt{i+1}}=\sqrt{i+1} itself. However, \lfloor \sqrt{i+1}\rfloor=\lfloor \sqrt{i}\rfloor+1. Hence, we get two extra 1’s, which still keeps f(i+1) even. Hence, f(i) and f(i+1) have the same parity. Now as f(1)=2 is even, f(i) is even for all i\geq 1. Hence proved. QED

  • Written on a blackboard is the polynomial x^2+x+2014. Calvin and Hobbes take turns alternately (starting with Calvin) in the following game. During his turn, Calvin should either increase or decrease the coefficient of x by 1. And during his turn, Hobbes should either increase or decrease the constant coefficient by 1. Calvin wins if at any point of time the polynomial on the blackboard has integer roots. Prove that Calvin has a winning strategy.

Calvin’s first move should be to increase the coefficient of x by 1. We will prove that Calvin has a winning strategy by induction.

Observe that for x^2+x+2, Calvin has the following winning strategy: After the first move, the polynomial becomes x^2+2x+2. Now if Hobbes keeps increasing the constant coefficient, Calvin should do the same. If Hobbes reduces the constant coefficient at any step, the polynomial will becomes x^2+ax+(a-1)=0, which has the integer roots 1,a-1. Hence, Calvin has a winning strategy for x^2+x+2k when k=1.

Note that for the general polynomial x^2+x+2k, if both players keep increasing the coefficients without ever decreasing them, ultimately the polynomial becomes x^2+(1+k+3)x+(2k+k+3)=0, which has the integer roots 3,k+1. Hence, in order to avoid losing, Hobbes should reduce the constant coefficient at least once.

We will use a form of strong induction here. If Calvin has a winning strategy for x^2+x+2k, then he has a winning strategy for any x^2+(1+a)x+(2k+a)=0, where a\leq k+3.

Let us assume that Calvin has a winning strategy for x^2+x+2k. From the last statement, we know that this implies that Calvin has a winning strategy for x^2+(1+a)x+(2k+a)=0, where a\leq k+3. Now let us consider x^2+x+2(k+1)=0. If Hobbes keeps increasing the constant coefficient, then we know that Calvin should do the same, and that is a winning strategy. If after a moves, where a<k+1+3, if Hobbes decreases the constant coefficient on the (a+1)th move, then the polynomial becomes x^2+(1+a+1)x+(2k+2+a-1)=x^2+(1+a+1)+(2k+a+1)=0. By strong induction, we know that Calvin has a winning strategy for this. Hence proved. QED.

Exploring Indian Mathematical Olympiads

Indian math olympiad questions are famous (infamous?) for being very analytical. There mostly do not exist any clever one line proofs. Multiple cases have to be analyzed and exhaustively eliminated before arriving upon the correct answer.

I tried solving problems from the Indian National Mathematics Olympiad, 2013 today. My solutions are different (lengthier, and hence perhaps instructive) from the official solutions given online. Hopefully they can be of help to anyone interested in mathematical problem solving.

The Indian National Mathematics Olympiad (INMO) is offered to those candidates who have cleared the Regional Mathematics Olympiad (RMO). The questions that are given are of a difficulty level comparable with P1, P4 of the International Math Olympiad.

I do plan on running a seminar on competitive math problem solving this summer. This is useful for all sorts of things, like getting better at algorithms, improving analytical skills in finance, etc. Please do let me know if you’d be interested in that sort of thing.

  • Find all positive integers m,n and primes p\geq 5, such that m(4m^2+m+12)=3(p^n-1)

First, we make a couple of observations. As p^n-1 is a multiple of 2, m will also have to be a multiple of 2. This can be re-written as 4m^3+m^2+12m+3=3p^n. On factorizing the left hand side, we get (4m+1)(m^2+3)=3p^n. Hence, as the right hand side is a multiple of 3, either 4m+1 is a multiple of 3, or m is a multiple of 3.

If 4m+1 is a multiple of 3 and m is not, then it has to be a multiple of 9, as m\equiv 2\pmod 3 is the only possibility. This is impossible as p\geq 5. Hence, 3|m. Note that from our previous observation, we also know that 2|m. Hence, combining the two, we get 6|m.

Let m=6m'. Then we have (24m'+1)(12m'^2+1)=p^n (we obtain this after canceling out 3 on both sides). For m'\geq 2, we have 24m'+1\leq 12m'^2+1. Hence, as the right hand side is p^n, we have 24m'+1|12m'^2+1. This implies that 24m'+1|12m'^2+1-(24m'+1)=12m'(m'-2). The right hand side is a multiple of 12m' while the left hand side is not. Hence, this is only possible if m'-2=0, or m'=2. As m=6m', we get m=12.

From this, we can deduce that (24m'+1)(12m'^2+1)=49^2, and hence p=7 and n=4. (12,7) is the only solution. QED

  • Let a,b,c,d be positive integers such that a\geq b\geq c\geq d. Prove that x^4-ax^3-bx^2-cx-d has no integer solutions.

Let x be a solution. Clearly x=0 cannot be a solution, as then we would have d=0, which is impossible as d is a positive integer. Hence, let us assume that x is a non-zero integer. Then we have x^4-ax^3-bx^2-cx=d. Therefore, x|d. Let d'=\frac{d}{x}. Then we have x^3-ax^2-bx-c=d'. Continuing like this, we get x=a+b'+c''+d''', where d'''=\frac{d}{x^3}, c''=\frac{c}{x^2}, and b'=\frac{b}{x^2}.

In other words, we have x=a+\frac{b+c'+d''}{a+b'+c''+d'''}. Clearly, as a\geq b\geq c\geq d, we have \frac{b+c'+d''}{a+b'+c''+d'''}\leq \frac{a+b'+c''}{a+b'+c''+d'''}<1. Hence, $x$ is not an integer. Hence proved. QED

  •  Let n be a positive integer. Call a nonempty subset S of \{1,2,\dots,n\} good if the arithmetic mean of the elements of S is also an integer. Further, let t_n denote the number of good subsets of \{1,2,\dots,n\}. Prove that t_n and n are either both odd or even.

Clearly, this is equivalent to proving that t_n-n is always even. Note that subsets consisting of one element have an arithmetic mean that is an integer. Hence, we can prove that the number of good subsets of \{1,2,\dots,n\} containing at least two elements will be even. The way that we’ll prove this assertion is by forming a pairing between sets: if \{a_1,\dots,a_k\} is one such subset, then consider the set \{n+1-a_1,\dots,n+1-a_k\}. Clearly this set will also have an integer average, and the map i\to n+1-a is idempotent. Hence, the mapping is bijective.

Case 1: Let n be even. The only cases in which bijective pairing maps a set to itself is for every i in the set, n+1-i is also included in it. As there is no i\in\{1,2,\dots,n\} such that i=n+1-i, as n+1 is odd, a set for which bijective pairing fails will have to contain an even number of elements. Hence, t_{2k+1} is even for all k\geq 1. For sets of the form t_{2k}, if i and n+1-i are both included, then the average will be \frac{k(n+1)}{2k}, which is clearly not an integer as n+1 is odd. Hence, every good set can be bijectively paired with a different good set, which implies that the total number of good sets is even.

Case 2: Let n be odd. Let us study the good sets for which the map i\to n+1-i does not produce a different set. The only different is that when i=\frac{n+1}{2}, i=n+1-i. Hence, sets of both odd and even number of elements can map to themselves through this pairing. However, if we can prove that t_{odd} and t_{even} are both odd, then their sum will be even. Combining this with the fact that the number of sets which map to other sets is anyway even, we get a total number of even sets.

Let us consider t_{2k+1}. Any set of this form will have to contain the element \frac{n+1}{2}. The total number of such sets is hence {n-1\choose 1}+{n-1\choose 2}+\dots+{n-1\choose n-1}=2^{n-1}-1, which is odd.

Let us now consider t_{2k}. Then the element \frac{n+1}{2} cannot be contained in such a set. Hence, the total number of elements is {n-1\choose 1}+\dots+{n-1\choose n-1}=2^{n-1}-1, which is odd.

Adding both cases together, we get an even number of sets that map to themselves through this mapping.

Hence, the total number of sets is even, which proves our assertion. QED

  •  Let a,b,c,x,y,z be positive real numbers such that a+b+c=x+y+z and abc=xyz. If a<b<c, x<y<z, and a\leq x<y<z\leq c, then prove that x=a,y=b and z=c.

This is one of my favorite problems. Without loss of generality, let x-a\geq c-z. Then keeping a+b+c constant, bring c\to z and a\to a+(c-z)=a'. This keeps b unchanged, and hence a'z\geq ac (as (a+x)(c-x)\geq ac), which implies a'bz\geq abc. Now keeping z constant, bring a'\to x and b\to y. This again implies that xy\geq a'b, which implies that xyz\geq a'bz\geq abc. Hence, xyz\geq abc if any of these changes happens. Hence, as abc=xyz, none of these changes should happen, and a=x,b=y and c=z. QED

An interesting Putnam problem on the Pigeonhole Principle

The following problem is contained in the book “Putnam and Beyond” by Gelca, and I saw it on stackexchange. I’m mainly recording this solution because it took me longer than usual to come up with the solution, as I was led down the wrong path many a time. Noting what is sufficient for a block of numbers to be a square is the main challenge in solving this problem.

Let there be a sequence of m terms, all of which belong to a set of n natural numbers. Prove that if 2^n\leq m, then there exists a block of consecutive terms, the product of which is a square number.

Let the n numbers be \{a_1,\dots,a_n\}, and consider the function f(k)=( a tuple of 0‘s and 1‘s), where the 0‘s and 1‘s denote the number of times \pmod 2 that each element a_i has appeared from the 1st to the kth element of the sequence of positive integers.

So f(1)=(1 somewhere, and the rest of the terms are 0), etc.

Clearly, if f(k)=(0,0,\dots,0) for any k, then the consecutive sequence of numbers from the 1st term to the kth terms is a square. If no f(k) is (0,0,0\dots,0), then there are 2^m-1 such tuples, and at least 2^m values of k. Hence, two of them must be equal. Let us suppose that f(k_1)=f(k_2). Then the sequence of terms from k_1 until k_2 is a square. Hence proved.

Proving that the first two and last two indices of the Riemann curvature tensor commute

I’ve always been confused with the combinatorial aspect of proving the properties of the Riemann curvature tensor. I want to record my proof of the fact that R(X,Y,Z,W)=R(Z,W,X,Y). This is different from the standard proof given in books. I have been unable to prove this theorem in the past, and hence am happy to write down my proof finally.

Define the function f(R(X,Y,Z,W))=R(X,Y,Z,W)-R(Z,W,X,Y). We want to prove that this function is 0.

By simple usage of the facts that R(X,Y,Z,W)+R(Y,Z,X,W)+(R(Z,X,Y,W)=0 and that switching the first two or last two vector fields gives us a negative sign, we can see that

R(X,Y,Z,W)-R(Z,W,X,Y)=R(X,W,Y,Z)-R(Y,Z,X,W).

Hence, f(R((X,Y,Z,W))=f(R(X,W,Y,Z))
Now note that R(X,Y,Z,W)=R(Y,X,W,Z). This is obtained by switching the first two and last two indices. However,

R(Y,X,W,Z)-R(W,Z,Y,X)=R(Y,Z,X,W)-R(X,W,Y,Z)=-f(R(X,W,Y,Z).

As f(R(X,Y,Z,W))= both positive and negative f(R(X,W,Y,Z)), we can conclude that it is 0.

Hence, R(X,Y,Z,W)=R(Z,W,X,Y).

It is not easy to prove this theorem because just manipulating the indices mindlessly (or even with some gameplan) can lead you down a rabbithole without ever reaching a conclusion. Meta-observations, like the above, are necessary to prove this assertion.

(More or less) Effective Altruism- March and April

In March, I donated $250 to EA

Screen Shot 2020-03-13 at 9.53.46 AM

In April, I decided to donate $250 instead to the Association for India’s Development to fight coronavirus in India:

Screen Shot 2020-04-07 at 9.19.23 AM

Thinking about a notorious Putnam problem

Consider the following Putnam question from the 2018 exam:

Consider a smooth function f:\Bbb{R}\to\Bbb{R} such that f\geq 0, and f(0)=0 and f(1)=1. Prove that there exists a point x and a positive integer n such that f^{(n)}(x)<0.

This is a problem from the 2018 Putnam, and only 10 students were able to solve it completely, making it the hardest question on the exam. I spent a day thinking about it, and my “proof” differs a lot from the official solutions, and is really a heuristic.

Proof: Assume that there does not exist any x and n such that f^{(n)}(x)<0. We will compare f with functions of the form x^n in [0,1]. We will prove that f\leq x^n on [0,1]. Because x^n\to 0 on [0,1) as n\to\infty, we will have proven that f=0 on [0,1) and f(1)=1. Hence, f cannot be smooth.

Why is f\leq x^n? Let us first analyze what f looks like. It is easy to see that f(x)=0 for x\leq 0. This is because as f\geq 0, if f(x)>0 for x<0, when f will have to decrease to 0 at x=0. Hence, there will be a negative derivative involved, which is a contradiction. Hence, f(x)=0 for x\leq 0, and by continuity of derivatives for smooth functions, all derivatives at x=0 are also 0.

Now consider the functions x^n, which are 0 at x=0 and 1 at x=1. These are the same endpoints for f(x) in [0,1]. If f(x) ever crosses x^n in [0,1), then it will have a higher nth derivative than x^n at the point of intersection. As its (n+1)th derivative is also non-negative, f will just keep shooting above x^n, and hence never “return” to x^n at x=1. This contradicts the fact that f(1)=1. Hence, f will always be bounded above by x^n in [0,1]. As this is true for all n, f=0 on [0,1) and f(1)=1. This contradicts the fact that f is continuous.