The concept of “social capital” has become one of the rare terms to transcend the rarefied world of sociology into everyday language. The media makes it sound like the solution to most of the world’s problems, both between individuals and countries. Due to the diverse applications to which this term has been put, there is no one definition of the term anymore.

Although this term has its historical roots in the writings of Marx and Durkheim, its modern presentation leaves much to be desired. Sociologists often only present the positive aspects of it whilst leaving aside the negative. Also, “social capital” is often interpreted as similar to monetary capital in its capacity to provide an individual with power, status or opportunities. Some authors have also gone on to the extent of saying that cities and countries too can possess social capital, as opposed to just individuals, and the presence of this ill-defined “social capital” is retrospectively held responsible for certain cities being more prosperous and stable.

Clearly, the modern presentation of social capital can benefit from a more balanced view. The author intends to do just that in this article.

Pierre Bourdieu, the first person to systematically analyze the concept, defined social capital as “the aggregate of the actual or potential resources which are linked to possession of a durable network of more or less institutionalized relationships of mutual acquaintance or recognition”. For historical reasons, this analysis did not become well-known amongst researchers, what with the original paper being in French. Bourdieu makes the point that it is the benefits that members accrue, from being part of a social network, that gives rise to the strength and stability of such networks. Social networks do not just come into place on their own. People have to invest time and effort into building social ties. However, once this network is in place, the members of this network can appeal to the institutionalized norms of group relations to gain social capital. In some sense, this is like a monetary investment that can pay dividends later.

Because spending this social capital may lead to the acquisition of economic capital in the form of loans, investment tips, etc, Bourdieu thinks of social capital as completely interchangeable with economic capital. However, the acquisition of social capital is much less transparent, and much more uncertain than the process of acquisition of economic capital. It requires the investment of both economic and cultural resources, and involves uncertain time horizons, unspecified obligations, and the possible violation of reciprocity. If you help your friend today, it is not completely certain that you will ever need their help in the future, and that they will help you if you do.

Another contemporary researcher who has probed this realm is Glen Loury (1977), who stated that economists studying racial inequality and welfare focused too much on human capital (which might perhaps be interpreted as individual education or ability), and the creation of a level playing field such that only the most skilled persons succeed. They want to create a level playing field by making employers’ racial preferences illegal. However, this cannot succeed because the acquisition of human capital by some communities is stunted due to a lack of economic resources, along with the absence of strong social networks.

The merit notion that, in a free society, each individual will rise to the level justified by his or her competence conflicts with the observation that no one travels that road entirely alone. The social context within which individual maturation occurs strongly conditions what otherwise equally competent in- dividuals can achieve. This implies that absolute equality of opportunity…is an ideal that cannot be achieved. (Loury 1977)

Although Loury’s analysis of social capital and social networks stopped here, this led Coleman to delve into the issue in more detail, who described how social capital leads to the acquisition of human capital. Coleman defined social capital as “a variety of entities with two things in common: They all consist of some aspect of social structures, and they facilitate certain action of actors- whether persons or corporate actors- within the structure”. Coleman also described some things that lead to the generation of social capital, like reciprocity expectations and group enforced norms, along with the consequences of having social capital, like privileged access to information. Resources obtained through social capital are often looked upon as “gifts”, and hence one must distinguish between the possession of social capital, and the ability to acquire these gifts through it. Not everyone who possesses social capital, by virtue of being a member of a social group, can necessarily acquire these gifts without some requisite social savvy.

Another distinction that should be made is between the motivations of recipients and donors. Although the motivations of recipients are fairly clear, donors can either be motivated by reciprocity from the individual that they’re helping, or greater status in the community; or perhaps both.

Coleman also talks about the concept of “closure” in communities, which is the presence of sufficient ties in a community that guarantee the observance of norms. For instance, the possibility of malfeasance in the tightly knit community of Jewish diamond traders in New York is pretty low because of “closure”. This leads to easy transactions between members without going into much legalese.

Another interesting perspective on social capital is offered by Burt, who says that it is the relative absence of ties, called “structural holes”, that facilitates individual mobility. This is because in dense networks, after a certain amount of time, new information is scarce, and only redundant information gets transmitted. It is the weak ties in a sparse network that can suddenly become active, and transmit useful and new information, that can lead to new contacts, jobs, etc. Hence, it is the weaker networks that mostly lead to advancement as opposed to the stronger or denser ones. This is in stark contrast to the stance taken by others like Coleman, who emphasize that the benefits that can be accrued through a social network is directly dependent on how dense that network is.

What motivates a donor to help out a person asking for help in a social network? This motivation can be consummatory or instrumental. A consummatory motivation is one that stems from a sense of duty or responsibility. For instance, the economically well off members of a tightly knit community might feel an obligation to help out those who are less privileged. An instrumental motivation is one that stems from expectation of reciprocity. Donors help others only to accumulate obligations, and expect to be repaid on full at some time in the future. This is different from an economic exchange however, because the method of repayment can be different from the original method of payment, and also because the time frame of repayment is more uncertain.

There is another set of examples that explains this dichotomy of motivations. Bounded solidarity refers to the mechanism through which an underprivileged or sidelined community develops a sense of solidarity, and all members feel a duty to help each other out. This is an example of consummatory motivation. On the other hand, sometimes donors help out others only to raise their status in society. Hence, although there is an expectation of reciprocity, it is from the whole community and not from an individual. This again is an example of instrumental motivation. Of course the two motivations can be mixed: a donor may extend a loan to another member of a community to both gain status, and also expect individual reciprocity in that they may expect that the money be returned in time. The strong ties in the social network would ensure that both happen.

The three basic functions of social capital are

- As a source of social control- Parents, the police, etc use their social capital to influence the behavior of others in the community. For instance, parents may expect their children to behave well by using their social capital, which they possess by virtue of being guardians of their children. This social capital, if one may imagine it to be some sort of money, is never really spent and exhausted. Parents will always have an infinite amount of social capital to control the behvaiour of their children. The same goes for policemen, etc.
- As a source of family support- Children may use their social capital, which they possess by virtue of being dependent on their parents, to expect help from their parents in all spheres of life. This form of social capital is also inexhaustible. It has been noted that children that are brought up in a stable household with two parents often experience better success in their education and careers. On the other hand, children brought up in one-parents households face a harder time dealing with their education and career. This is mainly because children in one-parent households have less social capital, in that they have one less parent to ask for help from.
- As a source of benefits through extrafamilial networks- This one is slightly more intuitive. Connections made outside one’s family can have a huge impact on individual mobility and careers. For instance, Jewish immigrants in New York at the turn of the 20th century often received help from other immigrants in the form of small loans of employment in companies. They had social capital just by virtue of belonging to the same community. Other examples of this phenomenon are New York’s Chinatown, Miami’s Little Havana, etc.

On the flip side, a lack of connections can spell doom for certain communities. For example, impoverished communities rarely have connections with the better parts of town which might provide them with employment or relief. For instance, inner-city impoverished black communities often lack connections with potential sources of employment, and remain mired in poverty. This problem is further exacerbated by the dense social network existing between members of this community, which leads them to influence each other to pursue crime and drug abuse.

Stanton-Salazar and Dornbrush have found a positive correlation between the existence of such extrafamilial social networks and academic achievement amongst Hispanic students in San Francisco. On a side note, they found an even higher correlation between bilingualism and academic achievement, highlighting the importance of being able to communicate with a wider community.

This form of social capital is exhaustible, in that you cannot keep asking the wider community for help and not expect people’s patience to run out.

It is important to identify the negative with the positive. Recent studies have noted four negative consequences of the existence of social networks:

- Strong social ties within a community can
**bar access to others**. For instance, business owners from some ethnic communities often employ only members of the same community. Control of the produce business on the East Coast by the Korean community, control of the diamond business in New York by the Jewish community, etc are examples. - Successful members of a community are often
**assaulted by job-seeking kinsmen**. The strong social ties often force these otherwise successful professionals/businessmen to help out or hire their kinsmen, affecting the overall quality and performance of their organizations. - Social control can lead to demands of
**excessive conformity**. In tightly knit traditional societies today, divorces are still looked down upon for instance, and errant members are ostracized. Privacy and individualism are reduced in this way. - Marginalized communities often develop a strong sense of solidarity, and are apprehensive about mixing with the rest of the population and go up the social and career ladders. Consider the following quote from a Puerto Rican laborer, for instance

(Bourgois 1991, p. 32

“When you see someone go downtown and get a good job, if they be Puerto Rican, you see them fix up their hair and put some contact lenses in their eyes. Then they fit in and they do it! I have seen it! … .Look at all the people in that building, they all “turn-overs.” They people who want to be white. Man, if you call them in Spanish it wind up a problem. I mean like take the name Pedro-I’mjust telling you this as an example-Pedro be saying (imitating a whitened accent) “My name is Peter.” Where do you get Peter from Pedro?”

Decades and centuries of discrimination or persecution often lead to certain communities becoming closed to the outside world, which removes them from the larger social network that could perhaps have helped them succeed. This self-imposed exclusion makes their situation even worse than before. These are known as downward leveling norms. Moreover, members that step outside of these communities are often ostracized, which leads to low overall member mobility.

Some political scientists have also extended the notion of social capital to cities and communities, renaming it as “civicness”. This “civicness” or social capital of a community encompasses “the features of a social organization, like networks, norms and trust, that facilitate action and cooperation for mutual benefit”. There is no information on the number of people involved in this social network, the density of the social network, etc.

Robert Putnam, a prominent advocate of the community view of social capital, said that the decline in the nature of cities and the community in general is a result of the loss of social capital through the falling membership of organizations like PTA, the Elks Club, the League of Women Voters, etc. Critics have called this view elitist for stating that social capital can only be regained through membership in these high society organizations. Moreover, they have also admonished Putnam’s opinion that the responsibility for increasing this social capital lies in the hands of the masses by joining these organizations, and not in the hands of the government or corporate leaders.

Dr Portes notes that Putnam’s argument is also circular. The social capital of a community cannot be measured directly. It can only be inferred from a community’s success. If a community is successful, one may infer that there probably is a strong sense of cohesion between the members. Anything that cannot be measured directly, and can only be inferred, cannot scientifically considered to be a cause. For instance, “emotional balance” cannot scientifically be considered a cause of a person’s success. There may be lots of causes of their success, like hard work of networking. Emotional balance can only be inferred: because this person is successful, they probably do have emotional balance. In this way, identifying one single cause for the success of a person or community, especially if that cause can only be inferred and not measured directly, is a dangerous game. The only way that Putnam could have proven his thesis is by taking two communities that are similar in all regards except that one has more social capital than the other (we are assuming that this social capital can be directly measured), and showing that the one with more social capital is more successful. This is obviously a difficult experiment to conduct in real life.

“Social capital” is essentially a mix of old ideas in a new garb. Moreover, it is unlikely that just *increasing* social capital will lead to a solution of community-wide problems. As has been explained above, social capital is also responsible for holding back certain communities from development. Hence, appreciating both the positives and negatives of social capital is important for having a balanced and realistic view of the concept.

- Social Capital: Its Origins and Applications in Modern Sociology, by Alejandro Portes.

]]>

I’ve often thought about whether “blacking” out the amount donated would be a more altruistic thing to do. However, this is not a one-off donation. This is supposed to be 10% of your income, and the amount donated should reflect that. I might still black things out in the future to make these posts less awkward.

How have I been able to afford it? I don’t drink, or eat out that much anymore. My other expenses have generally only reduced over time. Hence, I am perhaps just substituting one form of expenditure with another. I have tried donating to friends’ charities and things like that before. However, on average, I’m happiest donating to just EA and Arxiv.

]]>Notation: For this paper, denotes a topological space, and a sample of points from it.

Topological Data Analysis works on the assumption that the topology of data is important. But why? Let us take an example from physics. Suppose we want to study the energy emitted by atoms after absorbing sunlight. We observe that energy emitted by atoms forms a discrete set, and not a continuous one. We take a large number of readings, and observe the very same phenomenon. Hence, we can conclude with a high degree of probability that the topology of the energy states of atoms is discrete. This is one of the most fundamental discoveries of modern Science, and heralded the Quantum revolution.

It becomes clear from the above example that understanding the topology of data can lead us to understand the universe around us. However, we have to “guess” this topology of the “population” from a much smaller “sample”. Topological Data Analysis can be thought of as the study of making “good” guesses in this realm.

Let us take a metric space . The Hausdorff distance between two compact sets is defined as . It is essentially a measure of how “similar” two compact sets look. We need compactness, or rather boundedness, because we need the limits to be well-defined. However, what if are not necessarily subsets of the same space? Gromov generalized the above definition in the following way: The Gromov-Hausdorff distance between two compact sets is the infimum of all positive real numbers such that , where is the isometric embedding of in some manifold . Essentially, we want to see how “close” the two sets can get across all possible isometric embeddings in all possible manifolds. As one can imagine, calculating it is a seemingly impossible task in most situations, and it is primarily useful when an upper bound to it implies other useful mathematical facts.

Now given a set of points of affinely independent points, we can make a -simplex in called the convex hull of the points. A simplicial complex is a collection of simplices such that

- Any face of a simplex is also a simplex. We need this condition because we want to define a boundary map from the simplicial complex to itself, which will allow us to calculate topological invariants like the homology of the complex.
- The intersection of two simplices is either empty, or a common face of both. This condition ensures that only topological “holes” are detected in homology groups, and not other topological features.

Note that can be thought of as both a topological space and a combinatorial entity. The topological perspective is useful when we’re trying to break up a topological space into simplices in order to calculate homology, and the combinatorial perspective is useful when we use simplices to represent mathematical entities that are not originally topological spaces. For instance, polynomial rings can also be studied using simplicial complexes, where each vertex corresponds to a homogeneous polynomial. Of course, each combinatorial simplicial complex can be given a topological description. Also, note that the combinatorial description is more suitable than the topological one in the realm of algorithms and machine learning.

**Vietoris-Rips Complex**: Given a set of points with pre-determined distances, form all simplices of the form whenever the distance between any pair of points in is at most . If , we might even have simplices that cannot be fit into . The set of all such simplices forms a simplicial complex in some , where might be bigger than . As we increase the value of , this simplicial complex, and consequently its homology groups, will change.

**Čech Complex**: Given a set of points with pre-determined distances, we form -simplices when the -balls of points have a non-empty intersection. Hence, the maximum distance between two points now has to be , and not . A Čech complex is denoted as . How is this different from though? As shown in the diagram above, three pairwise intersecting discs give us a -simplex in , but just a -cycle in the corresponding Čech complex.

**The Nerve Theorem**: Given a cover of a manifold, we can form a Čech complex from it (note that the open sets here are , and not -balls as stated before). The Nerve Theorem says that if the intersection of any sub collection of is either empty or contractible, then the Čech complex is homotopy equivalent to the manifold.

The contractibility of intersections ensures that we do not “cover up” any holes in the manifold using our discs. But why the Čech complex, and not the Rips complex? This is because a hole would be covered up by three pair-wise intersecting discs in a Rips complex, but not in the Čech complex. Hence, the Čech complex is a useful way of preserving the homology of the underlying manifold.

Why is this theorem important? This is because it takes a continuous property of a topological space, and converts it into a combinatorial entity. For instance, if I only needed to know the homotopy class of a manifold for my problem, studying a simplicial complex on the manifold with the same homotopy type is much easier for me, as I can now feed the combinatorial data of the simplicial complex into well-known algorithms to make deductions about the space.

For a space , let be a continuous map. For a cover of , the sets form a cover of . Now consider the set of all connected components of . If the function and the open cover are well chosen, the nerve of this “refined” cover gives us useful information about the underlying space . Note that the don’t have to be contractible anymore. An example is given below:

The function here is the height function (look out for the slightly camouflaged yellow parts). This nerve of the two-holed torus does a decent job of representing its topology, although we fail to detect the small hole at the bottom because the cover chosen of is not fine enough.

In practice, however, we don’t map continuous manifolds, but just data points. A suitable example is given below:

From the nerve drawn on the right, one may conclude that the topology of the underlying population, from which the data has been extracted, is a circle.

Some popular choices of are the centrality function , or the eccentricity function . These functions do not require any special knowledge of the data.

As one may imagine, the choice of the cover determines the nerve that we get from the Mapper algorithm. Generally, one may choose regularly spaced rectangles which overlap with each other. If , then the length of the intervals is known as the resolution, and the fraction of overlap is denoted as . One must explore various resolutions and values of in order to find the “best” nerve of .

Now remember that the connected components of the `s form just the vertex set of the simplicial complex we are building. Although we could build a Čech complex from these pre-images, we may also cluster the vertices corresponding to the ‘s in other ways. For instance, we may build a k-NN graph with the points in , associate points to each , cluster them appropriately, and then only select the connected components of the subgraph whose vertices correspond to the ‘s. This is a different algorithm because we don’t care if the `s intersect anymore.

Let us now bring probability into the mix. Suppose we have a space with a probability measure . If we take a sample of points , we want the topology of to somehow approximate the topology of .

Now, we introduce some mathematical definitions. For a compact space , the -offset is defined as the set of all points such that . Why do we care about -offsets? Because they are much better at capturing the topology of the space around them. For instance, the homology of a loop is clearly different from the Euclidean space it is contained in. However, for appropriate values of , the -offset of the loop becomes contractible, and hence has the same homology as the surrounding space.

A function is distance-like if it is proper, and is convex. We need the proper condition because we don’t want unbounded sets to only be at finite distance from a point. Hence, properness ensures that only bounded sets are at a bounded distance from a point (with respect to which a distance function is defined). The second condition is that of semi-concavity, and I would like to re-write it in its more natural form: is concave. The reason that it is called “semi”-concave is that it is concave only when a very concave function is added to it. can actually be a convex function itself. Why do we want this condition here? This is because we want to generate a continuous flow using , and functions that “rise too fast” may have discontinuities when we generate this flow.

A point is said to be -critical if . It is a generalization of the notion of a critical point (where ). We want to find the smallest such that does not have any -critical values. This is known as the -reach of . What this means is that the level sets for will “flow” along the gradient of at a speed that is faster than , at least until they reach the level set . Why do we want level sets flowing into each other at all? This is because of their relation to Čech complexes. Consider the level sets in and . If the level sets of the first can “flow” into the level sets of the “second”, then there is no hole between them, and the two vertices corresponding to these inverse open sets can be joined by a line regardless of how these vertices have been clustered.

**Isotopy Lemma**– Let be a distance-like function such that has no critical points. Then the level sets of are isotopic for . Two topological sets are isotopic when they are homeomorphic, and one can continuously be deformed into the other. Essentially, the level sets in can essentially stay where they are, and the level sets inside can move because there are no critical points inside. Whether there are critical points in is irrelevant.

**Reconstruction Theorem**– This theorem essentially states that when two distance-like functions and are “close enough”, suitable sub level sets of both are homotopically equivalent. Of course it is unclear from the statement how these level sets “flow” into each other, and which function’s gradient field is chosen for this. Why is this theorem important?

Let and , which are the distance functions with respect to the support of on and respectively. The Reconstruction theorem can tell us that for appropriate values of and , the -offset of is homotopically equivalent to the union of the -offsets of the points in , which in turn is homotopically equivalent to the Čech complex formed by these offsets. Essentially, the Reconstruction Theorem provides the basis for studying the topology of using the nerve of .

An important result of Chazal and Oudot in this direction is the following: For a compact set, let the -reach of be . Also, let be a set of points such that -reach) of . Then for , . Here a suitable has been chosen. Why is this theorem important? Because it allows us to calculate the Betti numbers of using just information gleaned from the Rips complexes of .

Note that in all of the theorems discussed above, and have been “pretty close” as metrics. This forms the basis of all our topological deductions. What if they’re not? What if we have outlier points in ? Note that it is not in general possible to select a point from outside of the support of the probability measure on , as the probability of selecting a point outside of it is by definition . Hence, the existence of such a point is purely due to noise, which brings in a probability distribution that is independent of $mu$.

To deal with noise of this sort, we have the notion of “distance to a measure”. Given a probability distribution and a given parameter , .

Note that this map can be discontinuous if the support of is badly behaved. Hence, to further regularize this distance, we define

A nice property of is that it is stable with respect to the Wasserstein metric. In other words if are two probability measures, then . Hence, is a good distance-like function to consider to analyze the topology of , or at least the support of .

In practice, is not known. We can only hope to approximate it from . Consider the probability measure on . Although the exact construction of is not mentioned (it might just be a discrete measure), the following formula has been mentioned: for

,

Here denotes the distance between and its th neighbor in . If the Wasserstein distance between and is small, which is what we can hope if we take a large enough sample from the population, then is pretty close to in the measure.

Persistent homology is used to study how the Čech complexes and Betti numbers change with a parameter (generally the radius of the overlapping balls). Why is it important? We can never really be sure if the homology of the Čech complex that we have is the same as the homology of the underlying space. Hence, why not study all possible homologies obtained from an infinite family of Čech complexes? Two spaces with “similar” such families of homologies are likely to be the “same” in some well defined topological sense. Hence, this is another attempt at determining the topology of the underlying space.

A filtration of a set is a family of subsets such that . Some useful filtrations are the family of simplicial complexes or .

Given a filtration , the homology of changes as increases. Consider the persistence diagram below:

I will briefly try to explain what is happening here. is the height function defined on this graph, and for looks like an interval that is expanding in size. When etc, new intervals are created, which die when they join with some other interval that was created before them. For example, the interval created when joins the interval created at when we reach the point . (value of at the birth of an interval, value of at the death of that interval) can be graphed as a point in , as shown by the red dots in the diagram on the right. We also add al diagonal points with infinite multiplicity. One way to interpret that is for all values of , an infinite number of intervals come alive and die. The reason why we add this seemingly arbitrary line is that when we try to sample a population of data points, we might receive some noise. Hence, we can create a small neighborhood of the diagonal, and only interpret the points that lie outside of the diagonal as genuine topological features of the underlying manifold. The points inside the neighborhood denote topological features that are born and die “too soon”, and hence might just be noise. More will be said about this later.

Given below is the corresponding diagram for Čech complexes. The blue dots correspond to the birth and death times of “topological holes”.

Note that in the diagram above, all the balls grow at the same time. Hence, we don’t have a clear way of choosing which red interval should “die” and which one should survive. We arbitrarily decide that if two balls that start at the same time join, the red line below remains, and the one above ends. The persistence diagram is is given on the bottom right.

The inclusion diagram , where , induces the inclusion diagram of vector spaces . The latter inclusion diagram is called a persistence module. Why is this important? Because it shows, in real time, the evolution of the th Betti number of the diagram.

In general, a persistence module over a subset of the real numbers is an indexed family of vector spaces such that when , along with the property that . In many cases, such diagrams can expressed as a direct sum of “inclusions” modules of the form

where the maps are identity maps, and the rest are maps. In some sense, when the vector spaces in the persistence module are the groups, we are breaking up the for each -dimensional hole, and tracking when each hole appears and disappears. Chazal et al proved that if the map in a persistence module has finite rank for each , then it can be decomposed as a direct sum of “inclusions” modules in a well-defined way. One way to think of this is the following: a generic “element” of a persistence module is the set births and deaths of all topological -holes that survive at least one iteration (or increment in the value of ), between and . If all but a finite number of holes die only within one iteration, then each element of the persistence module can be thought of as a finite sum of “inclusions modules”.

The persistence landscape was introduced by Bubenik, who stated that the “dots” in a persistence diagram should be replaced by “functions”. But why? Because there’s not much algebra that we can do with dots. We can’t really add or multiply them in canonical ways. However, functions lend themselves to such operations easily. Hence, the inferences that we make about such functions may help us make inferences about the underlying topological space.

How do we construct such a function? Take a point . We just construct a function for a point that looks like a “tent”, by joining the points and by a straight line, and then joining and by another straight line. Three such “tents” for three different points are given below. They’re drawn in red.

The persistence landscape for this diagram is defined as , where is the th highest value in the set. For instance, the function in blue drawn above is .

A short note about the axes in the two diagrams above: the and on the left diagram correspond to time of birth and death respectively. For the diagram on the right, the axes denote the coordinates of the black “dots” on the functions. The “tent”-ed functions themselves may be thought of as a progression from left to right, in which a topological feature is birthed and then dies.

One of the advantages of persistence landscapes is that they share the same stability properties as persistence diagrams. Hence, if two probability measures are “close enough”, then their persistence landscapes will also be “quite close”.

We know that then two probability distributions are “close enough”, then the distance functions to those probability distributions are also “pretty close”. However, what about the persistence diagrams of those probability distributions? Does the persistence diagrams of two probability distributions being “close” imply that the probability distributions are also close? Before we can answer this question, we must find a good metric to calculate the distance between two persistence diagrams.

One such metric is the **bottleneck metric**, which is defined as

Here is a “matching”, which is the arbitrary bipartite pairing of points in with those in . Because the norm is too sensitive to “outliers”, a more robust metric is

But if two persistence diagrams are “close”, are the underlying probability distributions also “close”? We don’t know. But the converse is true.

Let and be two compact metric spaces and let and be the Vietoris Rips of Čech filtrations built on top of and . Then

We can also conclude that if two persistence diagrams are close, then their persistence landscapes are also close: Let and be two compact sets. Then for any , we have

Whether the closeness of persistence diagrams denotes the closeness of the underlying topological spaces remains woefully unanswered.

While talking about persistence homology, we have only talked about topological spaces, and not necessarily about probability distributions. We do so here.

Let be an underlying space with probability measure , and let be the compact support of this measure. If we take independent readings from this set- say , then we can estimate the space by . The probability measure on has support .

For some , let satisfy the condition . Then

Because we do not exactly know and hence the persistence diagram of , we can only calculate the probability of the persistence diagram of being close to that of . Clearly, as grows big, this probability becomes smaller. This can also be ascertained from the following formula:

approaches as . Here, is some constant.

If two functions on a manifold are “close”, then the persistence diagrams induced by them are also close. More precisely,

This opens up a vista of opportunities, in that we can now study density estimators, regression functions, etc. But how? Suppose we do not know how to calculate the persistence homology of a complicated function. We take a more regular function that is “close” to it in the norm, calculate its persistence homology, and then be assured that the persistence diagram of the complicated function looks almost like the persistence diagram of the current, “better” function.

When estimating a persistence diagram with an estimator diagram , we look for a value such that , for some . The gives us an upper bound on how “far” the two diagrams can be.

In some sense, if we were in the space of persistence diagrams (each point in this space is a persistence diagram), is the -confidence interval of . How does this translate to confidence intervals of the actual points in ? One way to do this is to center a box of side length at each of these points. Another way is to create an neighborhood of the diagonal line in the persistence diagram. The points outside of it are significant topological features of the sample, and are definitely preserved in . This is perhaps an important reason why we include the diagonal in persistence diagrams- points on the diagonal are unimportant topological features that might just be noise. Hence, we can infer the persistence diagram of the underlying space from that of the sample, as long as the points that we get are “far enough” from the diagonal.

Of course, all of this depends on whether we can successfully approximate the value of from the sample. Methods like the Subsampling Approach and the Bottleneck Bootstrap are important in this context.

Persistence diagrams are just a bunch of dots and a diagonal line. As they’re not elements of a Hilbert space, we cannot determine an “expected” or “mean” persistence diagram. Hence, we need to move to persistence landscapes, which are elements of a Hilbert space, and consequently lend themselves to such an analysis.

For any positive integer , let be a sample of points from . For a fixed , let the corresponding persistence landscape be . Now consider the space of all persistence landscapes (corresponding to different ‘s drawn from ), and let be the induced measure on , which then induces the measure on the space of persistence landscapes. It is now possible to calculate the expected persistence landscape, which is . This quantity is quite stable. In fact, the following is true:

Let and , where and are two probability measures on . For any , we have

Due to the highly non linear nature of persistence diagrams, persistence diagrams first need to be converted into persistence landscapes to be useful in machine learning. Such persistence landscapes have been useful in protein binding and object recognition.

The construction of kernels for persistence diagrams has also drawn some interest. Can we directly compare two persistence diagrams by taking their “dot product” in some sense? Convolving a symmetrized version of a persistence diagram (with respect to the diagonal) with the two dimensional Gaussian measure gives us exactly such a kernel. Such a kernel can be used for texture recognition, etc.

Sometimes, identifying topological features from a persistence diagram becomes a difficult task. Hence, the choice of a kernel becomes an important factor. Also, deep learning can also be used to identity the relevant topological features in a given situation.

Spiderman developed his awesome superhuman skills because a radioactive spider bite caused a mutation in his DNA. Genome-editing technologies have chased similar superhuman dreams for a long time now. What if we could edit our DNA itself to give us amazing capabilities, or remove those parts of the DNA that are responsible for our deformations? The first glimmer of hope in this direction was the discovery of restriction enzymes in bacteria, that protected them from invading agents called phages. Restriction enzymes scan DNA molecules, and if they see an “enemy” pattern that they’ve been trained to recognize, they cut the DNA molecules at an appropriate site, effectively rendering the “enemy” gene useless.

With this discovery, scientists were able to manipulate the DNA of cells in test tubes, rendering similar cuts to the enemy. However, manipulating the DNA of living cells that were part of a larger living organism (in vivo) remained an elusive dream. This was finally realized by the work of Capechhi and Smithies, who found that mammalian cells could incorporate a foreign copy of DNA into their own genome. This happens through a process called homologous recombination, and is explained here.

So could we just keep introducing desired DNA copies into mammalian cells, and hope that these get incorporated? No. This is because only in cells allowed the foreign DNA to combine with the existing DNA in the cell. Secondly, this foreign material could be incorporated in other parts of the DNA instead of the desired foci. Hence, we needed to get better control over the process.

Researchers soon realized that if there was a break in both strands of the DNA at the desired site, called a double-strand break or DSB, the frequency of the foreign material getting attached there would increase by orders of magnitude. This led to a lot of research into large “cutting” molecules, or meganucleases. These meganucleases could recognize strands that were 14-40 base pairs long, and then cut the genes at the desired site. This was problematic however, because scientists couldn’t find meganucleases for all the sites of interest to them. New meganucleases were not easy to engineer. Moreover, the meganuclease-induced DSBs are mostly repaired by non-homologous end joining (NHEJ), which may be thought of as a rough and slipshod method of joining ends of the DNA. Hence, this method would not be suitable for introducing the desired foreign DNA in the correct manner into the genes.

This problem was partially solved when **zinc finger proteins** (ZNPs) were discovered. Instead of 14-40 base pairs long sites, these proteins could recognize sites that were only 3 base pairs long. Hence, given the (possibly long) base pair configuration of a desired site, we could attach multiple such zinc finger proteins to match the sequence at the desired site. In this manner, scientists could manipulate many more sites on the genome than before. Note that the zinc finger proteins would not be performing the actual cleavage: this would be performed by an endonuclease called Fok I, to which the zinc finger proteins would be bound. ZNFs can be thought of as the search party, and Fok I held the actual knife for cleaving.

The situation was further improved when scientists discovered TALE proteins, which could now recognize just 1 base pair instead of 3 bp long sites. However, even with this discovery, a lot of difficult engineering and re-engineering of proteins was required to target all possible sites of interest. The CRISPR gene editing technology turned out to be just as robust as these technologies, if not more, and also much easier to use!

CRISPR stands for **c**lustered **r**egularly **i**nterspaced **s**hort **p**alindromic **r**epeat DNA sequences. These highly repeating DNA sequences, interspersed with non-repeating spacer genomic sequences, were first observed in Escherichia coli, although they were later observed to be present in more than 40% of all bacteria and 90% of archaea. These CRISPR sequences form the backbone of the bacterial immune response to invasion by bacteriophages. A horrifying video of such a bacteriophage invasion is present here. In response to this invasion, the bacteria would store a part of the foreign DNA of the invaders in the form of spacers. Hence, the CRISPR may be thought of as a library in which you keep records of all the invaders that have wronged you. In the future, if the same phages attacked the bacteria again, their DNA strands would get recognized and destroyed.

How does all this happen though? After detecting and storing the DNA sequence of the invading genome, the CRISPR system makes copies of this sequence and stores them in two short RNAs- the mature crRNA and the trans-activating crRNA. Both of these RNAs activate the Cas9 enzyme, which goes in search of this particular DNA sequence. When it does detect the sequence, it cleaves the genome, rendering it useless. However, the CRISPR system itself also contains a copy of this sequence! How does the Cas9 protein know that it should not cleave the CRISPR DNA sequence? This is because of propospacer-adjacent motifs or PAMs. PAMs are base pair sequences that are present in the invading genome but not in the CRISPR sequence. Hence, before cleaving, the Cas9 enzyme checks if the genome contains the relevant PAM or not, and cleaves the DNA sequence only if such a PAM exists.

Scientists soon realized that they don’t need to go through the whole shindig of first letting a foreign genome attack a cell, and only then getting the required genome sequence in order to look for DNA sites to cleave. They could just directly engineer the two crRNAs containing information about the DNA site which they wished to have cleaved, and the Cas9 enzyme would do the rest. Better still, instead of two, they could just manufacture one RNA- the guide RNA or **sgRNA**! This idea caught on pretty quickly, and since 2012, when the field was created, there have been over 10,000 articles written on this topic to date.

Just to be clear, CRISPR sequences, Cas9 enzymes, etc are not naturally found in human cells. They would have to be extracted from bacteria and other prokaryotes, and then put inside eukaryotic cells like those of humans. Moreover, the cleavage of DNA sites by Cas9 enzymes is only half the story. If scientists wish to add sequences to the genome, they would have to ensure that these sequences have already been accepted into the cell. The cleavage just speeds up the process of modifying the genome by adding these sequences.

There are two CRISPR classes- Class I, which contains types I and III of CRISPR, and Class II, which contains types II and IV. The most commonly used type is the type II, which is found in Streptococcus pyogenes (spCas9). However, researchers have also identified 10 other different types of CRISPR proteins. A table of some of them is given below:

As one can see, each protein recognizes a different PAM sequence in the genome before cleaving, and hence is suitable for attacking different types of invading genomes.

Because Cas9 or other cleaving proteins are not naturally found in human cells, they have to be packaged and delivered through Lenti or Adeno Associated Viruses (AAVs). This can be a problem if the proteins are big. For instance, the spCas9 protein is 1366 aa. Although some smaller cleaving proteins have been discovered, they have the disadvantage of having really complex PAM requirements. For instance, although the SaCas9 is only 1053 aa, it requires a PAM sequence of 5′-NNGRRT-3′. Here, 5′ and 3′ denote the ends of a DNA sequence. Because very few (invading or non-invading) genomes contain this particular sequence, SaCas9 can target very few types of invaders.

Scientists are curious about whether they can re-engineer the naturally found Cas proteins to change their sizes, PAM requirements, etc. They also want to improve the target specificity of these proteins, so that they don’t go cleave the wrong DNA sites. Unfortunately, Cas9 proteins have a natural propensity to not be too site-specific, as they were mainly used in bacteria to attack constantly mutating bacteriophages. In order to study the specificity of Cas9 proteins, scientists tried to map the DNA binding sites of catalytically inactive SpCas9. They saw that the protein was more likely to bind with open chromatin regions. Also, the cleavage rates at sites of incorrect binding were quite low. This was good news, as even though these proteins would bind with undesired sites, they wouldn’t do as much harm there.

Scientists have spent a lot of time thinking of ways to reduce off-site binding and improve target specificity. One method that is useful is changing the delivery method of the Cas9-sgRNA complex, from plasmid-based to delivery as a ribonucleotide protein (RNP) complex. This complex makes the Cas9 protein relatively inactive, and hence less likely to bind to the wrong site in a flurry of activity. Another method is to have two separate sgRNAs direct a nickase Cas9 (**nCas9**), attached to a Fok 1 enzyme, to cleave a certain site of the genome. A nickase Cas9 protein or nCas9 cleaves only one strand of the DNA helix, and not both. Hence, for such a complex to cleave the wrong site of the genome, both the nCas9 proteins have to make a mistake, which has a smaller probability than just one of them making a mistake. Obviously the two nCas9 proteins are slightly separated, and contain different sequences. Other ways of affecting specificity of the cleaving proteins are increasing or reducing the length of the sgRNAs, attaching self-cleaving catalytic RNAs to the sgRNAs to regulate Cas9 action, using optical light to regular Cas9 approaches, etc.

What if we just want to identify relevant genome sites, and not cleave them? For this purpose, we can use catalytically inactive dead Cas9 proteins, or dCas9. How are dCas9 proteins formed? A regular CRISPR-Cas9 protein has two catalytic domains- HNH and RuvC, which cleave one DNA strand each. Point mutations in either of them render them ineffective. Hence, a point mutation in only one of them gives rise to a nickase Cas9, and point mutations in both gives rise to a dCas9.

In this section, we will primarily talk about the **nCas9**. The nickase cas9, or nCas9, is quite useful for converting one base into another, without cleaving both strands of the DNA and hence possibly introducing harmful indels (indels or insertions/deletions are arbitrary insertions or deletions of base pairs in the DNA strand). Komor et al discovered that nCas9, fused to an APOBEC1 deaminase enzyme and a UGI protein, can change C to T without cleaving both strands of the DNA helix. Similarly, another nCas9 complex is now able to change A to G. Scientists can now subsequently introduce STOP codons in genes. A STOP codon is a trinucleotide (can be thought of as a sequence of three bases) present in the RNA, that halts the production of proteins when instructions are bring read from the mRNA. Hence, the distance between the START and STOP codons determines the number of amino acids in a protein molecule. Scientists realized that by changing C to T, scientists could change the trinucleotides CGA, CAG and CAA to TGA, TAG and TAA, which are the three STOP codons. Hence, scientists could effectively manipulate the production of proteins in the ribosomes. Another route that scientists have gone down is forming an nCas9-AID complex, where AID stands for activation-induced adenosine deaminase enzyme. In the absence of UGI, this complex supports local mutations, and hence is a powerful gain-of-function screening tool. Gain-of-function screening tools are those that identify which genes are most suitable for mutation in order for the organism to develop a desired phenotype. Hence, the nCas9-AID complex can introduce mutations at multiple genes, and then select the most suitable.

In this section, we primarily deal with **dCas9** or catalytically dead Cas9, because we don’t want to cleave any DNA sites.

Gene expression is the process by which a gene is converted into a final product, which may be a protein, non-coding RNA, etc. Hence, regulating gene expression is an important goal for researchers: essentially, we wish to induce “beneficial” genes to express at a higher rate, and the “bad” genes to not express at all. dCas9 was found to tightly bind to DNA sites, and prevent other proteins such as RNA Polymerase II to bind there and start transcription. This phenomenon was exploited to form the CRISPR interference approach or CRISPRi. Notably, attaching a Kruppel-associated Box or KRAB complex to dCas9 results in an even stronger gene repressor. It has been shown that KRAB-mediated gene repression is associated with deacetylation and methylation of histone proteins. Wait, what are those?

Histones are proteins around which the DNA double helix wraps itself, both at the actual targeted gene, and also at the promoter and enhancer sites of the gene. When acetyl groups are attached to histone molecules, the helix unwinds, and becomes ready for transcription. When these acetyl groups are removed (temporary change), or replaced by methyl groups (permanent change), the DNA helix wraps itself even more tightly to the histone proteins, and hence is not expressed. For the H3 histone protein, it has been noticed that the repression activities of the KRAB-dCas9 complex occurs through H3 deacetylation and increased H3 methylation, especially in the H3 proteins present in the promoter and distal (far away) enhancer regions of the targeted gene. This picture is quite complicated, however, and is explained in more detail later.

In contrast, dCas9 can also promote gene transcription (and hence expression) through fusion with VP64, which is composed of four identical repeating units of VP16, a 16-amino acid chain found in the Herpes simplex virus. Other dCas9 complexes that promote gene expression are SunTag, VPR and SAM. SunTag has a dCas9 fused protein scaffold that contains a repeating peptide array, that is used to recruit multiple copies of an antibody fused effector protein. These effector proteins bind with the histone modules and regulate gene expression. SAM is just a complex of gene expression-promoting proteins comprising of dCas9-VP64 and MCP-fused P65-HSF1. The latter is carried to the target site in an engineered sgRNA scaffold. VPR is a complex of VP64, P65 and Rta proteins, all of which also enhance gene expression. CRISPR regulates gene expression, but the actual expression of the gene happens through the regular mechanism of the cell itself, as opposed to other approaches in which gene expression may be facilitated by foreign elements. Hence, this process is more robust and less prone to errors.

Epigenetics refers to the mechanism of differential gene expression, even though the genome might be the same. Hence, two identical twins with the same genome are different in many ways because of differences in gene expression. The “epigenome”, on the other hand, refers to the set of molecules that attach to the genome in order to regulate gene expression. All of this is explained beautifully in this video. Epigenome may also influence post-translational modifications of features. Despite recent epigenomic mapping efforts like the Encyclopedia of DNA elements (ENCODE), the functioning of even basic epigenomic features like histone modifications and DNA methylation remain poorly understood. Scientists now hope to use dCas9 complexes to add or remove epigenetic markers at various locations on the genome, in order to study their impact on gene expression. We have already seen how dCas9 induced DNA methylation at the promoter or enhancer sites leads to gene suppression. It is known that many disorders including some types of cancer are caused by aberrant methylation (too much or too little). Although some drugs exist to counter this, they act on the whole genome globally, and hence may affect undesired sites. Some dCas9 complexes like DNMT3A can rectify this by promoting methylation only at the targeted sites. Note that even Cas9 proteins are not known for being very target specific, and are often found bound to undesired sites. However, gene expression does not change at these undesired sites. This makes DNMT3A a useful complex to promote methylation.

On the other hand, if we want to suppress excessive methylation, TET proteins are pretty useful. Researchers formed dCas9-TET1 complexes to promote demethylation at desired sites. The outcome was found to be robust, as there was a 90% reduction in methylation at CpG dinucleotides. The impact at off-target sites was yet to be studied.

Although methylation is seen as a way of suppressing gene expression, it can also promote gene expression in some cases. This phenomenon is beautifully explained in this video. Histone proteins contain 4 types of residues- H2A, H2B, H3 and H4. The H3 residue contains both the H3K4 and the H3K27 sites. Both the acetylation and methylation of H3K4 promote gene expression, while the trimethylation of H3K27 only suppresses gene expression (acetylation of H3K27 promotes gene expression though). This duo can act as a bivalent regulator of gene expression, in which one part promotes and the other represses gene expression. Researchers are curious about controlling the methylation and acetylation of histone residues via dCas9 complexes.

In order to control the methylation and acetylation of H3 resides, researchers used a dCas9 complex to recruit LSD1 at the desired sites to reduce the number of enhancers H3K4me2 and H3K27ac (remember that the acetylation of H3K27 has made it an enhancer). Hence, this complex serves to repress gene expression. On the other hand, the dCas-P300 complex results in a significant increase in the number of H3K27ac, which promotes gene expression. Other dCas9 complexes have also been used to increase H3K4me3, which promotes gene expression, or reduce H3K27ac, which represses it. The global footprint (impact on the genome globally) of such dCas9 complexes is still unknown.

Although technology to image specific parts of the genome has existed for some time, it was mainly done in vitro (in a test tube) through Fluorescent In-Situ Hybridization (FISH) methods, and not in vivo (in a live organism). The development of CRISPR has revolutionized live cell chromatin imaging.

But “how bright does the bulb have to be”? If we imagine the dCas9 complexes as bulbs that attach themselves to desired genomic loci, these are likely to be too small and faint to register on our machines. Hence ideally, such complexes should target repeating genomic sequences that are close together, so that multiple such bulbs can go and attach themselves to each of these repeating sequences, giving out a brighter light that can be registered. For a non-repeating sequence, 26-36 sgRNAs need to attach themselves to one single sequence in order to produce a clear enough signal. So many sgRNAs attaching themselves to a single site is statistically quite unlikely. To overcome this problem, researchers came up with an sgRNA scaffold containing 16 MS2 binding molecules. All of these molecules travel together, and hence attach themselves to the binding site when the sgRNA reaches the desired loci. Put together, these now generate a strong enough signal for imaging. Using these scaffolds, repeated genomic sequences can now be imaged with just 4 sgRNAs, and non-repeating sequences can now be imaged with just 1 sgRNA, as explained above.

Chromatins are strands of DNA that are arranged linearly. What if we could bring the promoter and enhancer for a gene closer together, or push them further apart? Would this affect gene expression? Yes! That is why researchers are interested in forming chromatin loops, or change the topology of chromatin strands in other ways. Morgan et al took two dimerizable proteins (proteins that had a tendency to attract each other and form a bond), and attached them to two different dCas9 complexes. These complexes now attached themselves to the promoter and enhancer regions separately, and then the dimer bond formation between the two proteins brought the promoter and enhancer closer together. This did result in an increase in gene expression.

How do we find out which gene affects a particular phenotype, say cell proliferation? Checking each gene out of the millions available is surely a daunting task. What if we could check thousands of genes at once? This task is accomplished using hundreds of thousands of sgRNAs in a large population of cells. The way that this works is this: we ensure that each cell receives one or less sgRNA, and each gene is targeted by 6-10 different sgRNAs. This means that at at least 6-10 cells are used to study the impact of one gene on the desired phenotype, which in this case is cell proliferation. The sgRNAs which hit the correct gene will cause their cells to proliferate fast, and the other cells will die out eventually. This helps us zero in on the gene which causes cell proliferation. Of course we will have to keep track of which sgRNA goes to which cell, which will allow us to make the right deductions.

One major aim for future researchers should be to reduce the size of the existing Cas proteins, so that they may be easily transported using virus vectors. Another aim should be the careful design of CRISPR procedures, so that “gene drives” that potentially impact entire populations do not cause harm in the long run.

An important obstacle to overcome is the fact that more than half of all humans experience an immune response to the introduction of Cas9 proteins in cells (this is called immunogenicity). One possible solution to this problem is the development of Cas9 proteins to which humans have not been exposed before, so that we don’t have an immune response against them.

CRISPR has great potential to benefit society and eradicate formidable diseases. I am excited to see what comes next.

- The CRISPR toolkit for genome editing and beyond
- https://mbio.asm.org/content/5/4/e01730-14
- https://www.youtube.com/watch?v=vP23dkY0mPo
- https://www.youtube.com/watch?v=lLvdxtPaYGM
- https://en.wikipedia.org/wiki/Bivalent_chromatin
- https://en.wikipedia.org/wiki/Epigenetics
- https://www.youtube.com/watch?v=iSEEw4Vs_B4

]]>

- Why the affine connection? Why this notion of derivative in particular?

A common sentiment that goes around in mathematical circles is that we need a coordinate invariant notion of a derivative. When we say , we are specifying a Euclidean coordinate chart, using which we are differentiating the function . But Euclidean charts are not always the most convenient setting for calculations- sometimes we need polar coordinates, for instance. Hence, if we could represent equations in a way that does not assume a coordinate chart, it will make life much simpler for us. There would be no complicated Euclidean-to-polar coordinate conversion operations, for example.

Let us now dig slightly deeper into what a coordinate invariant mathematical expression actually means. Suppose we have a physical law saying that a quantity exists such that . Now if we have a transformation such that , then we know that this law cannot hold true anymore. This is because if , then . Hence, when we state this physical law, we also have to specify the coordinate system that we must choose.

Much importance, at least in Physics, is given to the fact that there is no preferred coordinate system. All inertial systems have the same Newton’s laws. In Special Relativity, we find quantities that are Lorentz invariant. Why can we not just specify the coordinate system each time we mention a law? This is because things can get unmanageable and cumbersome if we propose a different law for each moving reference frame. Moreover, these “laws” might also change when we change units of space and time: in fact a choice of such units is also a choice of coordinate systems. Therefore, physical laws should be such that regardless of whether we choose metres of feet, and regardless of whether we choose Euclidean coordinates or polar coordinates, they remain invariant. I can now choose my preferred coordinate system which simplifies calculations the most, and then arrive upon the answer.

Now that we’ve established that we need a coordinate invariant notion of a derivative, why the affine connection in particular? Mainly because the properties and simplify a lot of calculations. These are just constraints that we put on the definition, which give us a unique connection. We could also have put other constraints, and perhaps gotten a different unique connection.

How do we know connections are coordinate invariant? Because connections, by definition, have the property that . Hence, the coordinate invariance property follows from the definition itself. When we don’t specify a specific coordinate system, and claim that a certain mathematical expression holds in general, we have written down a coordinate invariant expression. This is exactly what we do here.

Another important point to note is that because connections follow the product rule of differentiation, the difference of two connections is always a tensor. Hence, if is the affine connection and is any other connection, we can just define the tensor , and we’re done. How do we define the affine connection in an intuitive way then? We seem to have a lot of choice, as we can choose any other already defined connection , and then write some tensor. Here, we choose to be Euclidean differentiation. This allows us to interpret the affine connection as a “correction” to regular differentiation.

2. Why do we deal with abstract notation at all? Why do we have something like ? The indices show what kinds of mathematical objects we are dealing with. The , for instance, tells us that it takes in a vector. , when it accepts vectors, will become a function.

Let us now consider the tensor . How do we know where and go? Does go the the or the ? We solve this conundrum by the following rule: goes to the left-most place it can go, and goes to the left-most place after that. Another, perhaps clearer way of saying this is that we contract with the index and with the index. We use this notation because of the tensorial nature of this mathematical object- when goes to the , we get .

So is that it? Is this expression equal to ? Yep. It’s as simple as that. But this doesn’t “mean” anything. Let me try and elaborate on this statement. This is just abstract notation. Fluff. Refined nonsense. We know which vector goes where. We have some information about the mathematical object we are dealing with. However, performing actual calculations is a completely different beast.

How do the calculations go, though? We first select a coordinate system and vector space basis elements. We then perform the tensorial differentiation via the connection . It is only then that we plug in the vectors into the right places. What does the do? There are again two levels of understanding- one level is just manipulating this expression abstractly, and another is actually choosing a coordinate system and calculating the final expression. For the abstract level, we can just write this as . Now the actual calculation: suppose we choose an orthonormal basis. Then we can write as , and then simplify. Let us simplify this particular expression. This becomes , which simplifies to

Each of these terms can also be simplified using the same rules of tensorial differentiation. Hence, the actual calculation is a long iterative process. When we deal with these expressions abstractly, however, manipulations are generally substantially shorter.

3. Whenever we perform calculations at a point, they become substantially shorter and easier. Why? And what does performing a calculation at a point even mean? When we select a tensorial operation and vector or co-vector fields to operate on, we are selecting global entities. All of the mathematical objects defined above are defined over the whole space or manifold. However,if , then . where is a tensor. The utility of this fact is that given a complicated vector field , we can choose a really simple with special properties that will make life easy. For instance, when dealing with tensors, we can always choose such that they . This substantially simplifies calculations. However, the most common pre-requisite for such drastic simplifications is that we are dealing with tensors, which is something one should always check.

4. Raising and lowering indices- We know that a lowered index means that the tensor accepts vectors, and a raised index means that it accepts co-vectors. However, why do we raise a lowered index, and vice-versa? We will first talk about lowering an index. Consider a vector . We can lower that index via . In common parlance, we say that is now a co-vector. But how did we magically get a co-vector by just multiplying with ? Let us see what happens when we contract a vector with . We get , which is the inner product of the two vectors! Hence, converted a vector into a co-vector because it transformed .

The same can be said about the raising of indices for co-vectors. Because inner products of all kinds of tensors are defined only using the metric, or are involved in raising or lowering indices for all tensors.

5. What does do exactly?

Essentially if and only if . This implies that because, by definition, , and is the unique vector with this property.

6. Who came first, the dual or the metric? If we think of a function as a co-vector, then we know that its dual can only be defined in terms of its metric. Hence, we can conclude that duals are not defined independently. The dual of a vector is a covector such that given a vector , . In fact, the dual doesn’t have to be such that . The value of completely depends on the metric.

7. What does it mean to raise the index of ? In other words, what does mean? When we contract a co-vector with in the form , then what we are really doing is determining the inner product . However, is an operator that acts on other tensors. Hence, the actual calculation is a completely different story than this abstract nonsense. For co-vector , consider . Like before, the actual calculation requires substantial simplification before we can just bring in the vectors and sum over everything.

]]>Currently, 5-15% of the world’s energy goes into powering computing devices. Surely we’d love to have devices that consume much less power. We are, in fact, acquainted with a computing device that consumes much less power- the human brain. Can we make computing devices that mimic the human brain in terms of power consumption, given that we are ready to give up some of the accuracy that modern computational devices possess? Turns out that we can…if we can construct working neuromorphic devices.

Other capabilities that a brain-like device might possess are speech and image recognition. Clearly, human brains are capable of recognizing people and understanding what they are saying. Despite impressive capabilities in other areas, conventional devices are still thought to be pretty bad at this, despite consuming much more power. Hence, it might do us good to manufacture brain-like devices.

But wait. Devices have been created which are “pretty” good at speech and image recognition. Systems have now been created with 5 billion transistors on a single die, and feature sizes approaching 10 nm! Moreover, parallel computing helps us put multiple such devices to work at the same time. Surely things are looking good in terms of capabilities (if we can forget that such devices consume millions of times more power). However, this dream cannot last forever. There will come a time when we cannot pack any more transistors on a single die. Because of energy dissipation, we will not be able to pack transistors more densely (otherwise, because of the large amount of energy dissipated in too small an area, the whole die may catch on fire or be severely damaged). In short, we will have to create a fundamentally different kind of technology, that does not just involve packing more and more transistors on a chip. Neuromorphic devices promise to be this different kind of technology.

It is important to note that neuromorphic computing may not be as accurate as conventional computing devices. Hence, they need only be implemented when lower power “functional” approximations are required, as opposed to high power precise values. For instance, if you need to find out whether Jupiter is closer to us or the Sun, we need only know the approximate distances in millions of miles, and not up to inches. Neuromorphic computing may be useful to us in these situations.

Another feature that is important to us is that neuromorhic devices should be able to learn from past experience, much like humans do. If you take two people and put them in the same situation, say lock them up in a room, they may react differently. One person may panic, possibly because of an earlier clasutrophobic experience, while the other person may stay relatively calm, knowing that someone will most likely rescue them, based on a similar past experience. We want neuromorphic devices to also have different reactions to the same situation. This means that we want neuromorphic devices to have self-learning capabilities when exposed to external stimuli, and we want this self-learning to change them and their reactions to future stimuli. This is not what happens in conventional devices, as all devices of the same kind react in the very same way to external stimuli.

The von Neumann architecture, named after the famed polymath John von Neumann, has the processor and memory units separated. Whenever a computation needs to be performed, the processor calls for the required information from the memory unit, performs the computation, and then stores the answer in the memory unit. Although processing power and storage capacity have both increased over the years, the rate of transfer of information between memory and the processor has stayed relatively stagnant. Hence, this is a bottleneck that is preventing faster computations.

Neuromorphic devices on the other hand would have their memory and processing units located right next to each other. In fact, in the human brain, there is not such a clear demarcation of what the storage unit is and the processing unit is, and different parts take on either or both roles based on learning and adapting. Hence, the bottleneck present in von Neumann architectures can be avoided.

Also, at the device level, the von Neumann architecture is made up of resistors, capacitors, etc. These are elements that perform clearly differentiated functions, and are not fault tolerant. If the resistors in the device stop working, the whole device stops working. The human brain, and hence the neuromorphic architecture, consists of just one element- the neuron. The neuron consists of dendrites, the soma, the axon and synapses. While these parts do perform some specific functions, most of them multi-task, and can learn to perform the tasks of another neuron or another part of the same neuron. Hence, neuromorphic architecture is fault resistant by design, which means that if some part of it stops working, the device can still adapt and keep going.

Let us now directly compare the two architectures:

In the advantages column, black font shows an advantage for the von Neumann architecture, and red font shows an advantage for the neuromorphic architecture. It becomes clear that although neuromorphic devices are much less reliable than conventional devices, they consume a million times less power, and neurons can also be much more densely packed on a chip than transistors because of the smaller amount of energy dissipated. Note that the above data for neuromorphic systems was not gathered from actual neuromorphic devices (good scalable devices are yet to be built), but from simulations of such devices on conventional transistor chips.

Many existing devices can be modified to approximate neuromorphic devices. On one hand, we have artificial neural network architectures whose processing algorithms depend on matrix operators (which are much faster than other algorithms). These also consume much less power than conventional CPU devices, although they are less accurate. On the other side of the spectrum we have the **actual** digital or analog implementation of neurons- dendrites, some, axon and synapses. Axons are implemented as conventional wires, and synapses can be constructed to show learning with time (this is called spike dependent plastic synapses or STDP). The analog implementation is much more power efficient than the digital implementation, and also requires less transistors (yes, both still require transistors). However, the brain is still four or five orders of magnitude more power efficient.

Some devices, like IBM’s TrueNorth chip and FACETS at the university of Heidelberg, contain millions of neurons and billions of synapses. Hence, they may be coming close to approximating the complexity and power of the human brain. Why are these more synapses then neurons? A synapse may be thought of as a connection between one neuron and another. Each neuron needs to be connected to very many other neurons to be useful. Hence, the number of synapses needs to be orders of magnitude bigger than the number of neurons. Existing neurotrophic chips restrict the number synapses per neuron to 256. We might need many more synapses per neuron in order to become “more like the brain”.

Also, in current neuromorphic devices, synapses are arranged in a dense crossbar configuration.

Synapses being in this configuration implies that multiple neurons can be connected to each other at the same time, through direct contact. However, the crossbar cannot have an arbitrary fan-in fan-out ratio. Hence, the number of neurons that can be connected to each other through this crossbar configuration is clearly controlled.

These properties of the brain need to be replicated in our ideal neuromorphic devices of the future:

- Spiking- Neurons work only when they “spike”. The intricacies of how they work are explained in this beautiful video. Essentially, neurons lie at rest until their their voltage is disturbed to above a threshold value, after which they perform whatever function they’re expected to perform, and then go back to rest. It’s like a bored parent sitting in front of the TV, who doesn’t really care if the kids are dirtying the house or fighting with each other. They only get to action if they hear screaming or see blood, discipline the children, and then again go back to resting and watching TV. That sure sounds much more relaxing than a hyperactive parent who is constantly running around their children, screaming at them and micro-managing all their activities. Conventional devices are like those hyperactive parents, always working, and hence constantly drawing copious amounts of power. Neuromorphic devices would be like the bored parent, work only when they’re asked to work, and be at rest the rest of the time.
- Plasticity- Neuromorphic devices, we hope, will be self-learning. For this, they need to have a little plasticity- be able to change their properties and parameters as they learn.
- Fan in/fan out- Conventional devices have a much lower number of connections between various units than that required in neuromorphic devices. We still have to figure out whether this part is essential, or we can do with a smaller number of connections.
- Hebbian learning- A much used phrase in neuroscience is “neurons that fire together, wire together”. What does this mean? Imagine that performing a particular task involves neuron A sending a message to neuron B. At first, neuron B might not be as receptive to neuron A’s message, as the synapse between them might be weak or relatively ineffective. However, the more this synapse is activated, the stronger it becomes. After a few times, it becomes super easy for neuron A to transfer a message to neuron B, and hence it becomes much easier, faster, and almost second nature for a person to perform that task. This reaffirms conventional wisdom that practicing a skill will make you better at it. We need neuromorphic devices to have the same property- the more times one neuron activates another, the stronger their connection should become. In contrast, this does not happen at all in conventional computing, and your computer doesn’t run Fortnite faster if you’ve played it every day for the last 5 years.
- Criticality- The brain is thought to be in a critical state. By this, we mean that the brain, although not chaotic and hence relatively stable, is capable of changing itself as it is exposed to the varied experiences that life throws at us. We want neuromorphic devices to be the same way.

The four building blocks of a neuromorphic device would be:

- Synapse/memristor- The synapse is perhaps the most important of the building blocks. It transmits messages between neurons, and strengthens or weakens with time. The device that can perhaps mimic it most convincingly is the memristor. I learned about this device a couple of weeks back, and it is by far my favorite piece of technology. This video does a fantastic job of explaining what a memristor is and what it does. It essentially acts like a transistor that is capable of handling very high voltages. The capability of a memristor that we are concerned with here is that it should be able to strengthen with time. Let us look at the diagrams below:

A regular transistor, as shown in the graph on the left, shows the same increase in current as we apply a voltage to it. Each time we apply this voltage, the current goes up by the same amount. A memristor on the other hand, as shown in the graph on the right, shows an increase in current every time we apply the **same** voltage. In other words, as opposed to a regular transistor, the transfer of messages by the memristor gets easier every time.

2. Soma/Neuristor- The soma is the body of the neuron where the threshold voltage is reached, which causes the neuron to spike and send a message to other neurons. One possible implementation is a capacitor coupled with a memristor. The capacitor, when it reaches the threshold frequency, would activate the memristor, which in turn would send the message.

3. Axon/Long wire- The axon was previously thought to be responsible only for signal conduction between the some and the next neuron. However, it is now known to also be responsible for signal conduction. It can be implemented using a long wire.

4. Dendrite/Short wire- Dendrites are the input wires that bring messages from outside **to** the soma. They have also been shown to possess pattern recognition capabilities. They can be implemented using short wires.

There are multiple issues to be resolved, if we implement neuromorphic devices are suggested above.

For one, memristors often show chaotic behavior. They are designed to show huge increases in drain current when a small voltage is applied to them. Hence, when this voltage difference exists by accident in the environment, memristors can allow huge currents to pass when not required to do so. They can also inject previously stored energy into the device, which leads to undesired voltage amplification.

The way that a synapse implementation can work is the following: synapses are required to show stronger connections the more they’re activated. Hence, if two electrodes are separated by an insulator to not allow large current to pass, as those electodes are activated more and more, we need metal impurities to slowly form between the two electrodes in order to allow currents and hence signals to flow. However, the rate of formation and positions of these impurities is still random, and hence a lot more study is required so that we may be able to control this “strengthening” of the synapse.

Spin torque switching happens when a relatively large current of polarized electrons causes the magnetic field in an electromagnet to switch. This article is a beautiful explanation of how this happens. Such devices are used in Magnetic RAMs and other storage devices. However, the stability of such a device, along with the effect of impurities on it at the nanoscale, still needs to be studied.

Nanoelectronics, on the whole, is a complicated endeavor. Effects of impurities, Oersted fields, etc, which are not markable at larger scales, become all important at the nano scale. If we are to squeeze millions of neurons onto a chip, we have to study nanoelectronics. Hence, the amount of progress we can make in this field directly controls the amount of progress we can make in building powerful neuromorphic devices.

This paper, as noted before, was written at a conference which hoped to lay down a roadmap that computer scientists and material scientists could follow to make scalable neuromorphic devices. Its recommendations, for such a dream to reach fruition, are the following:

Computer scientists and material scientists should work together . Computer scientists can worry about optimizing algorithms and putting everything together, while material scientists can worry about the best possible implementation of the building blocks. The successful construction of such a device will follow only from singling out the best possible materials for the various parts, which can come only from a deep grasp of the quantum properties of materials along with a good understanding of nanotechnology.

Moreover, the device should exhibit three dimensional reconstruction, external and internal healing, distributed power delivery, fault tolerance, etc. It is important to understand that the construction of such a device will not happen overnight, and hence scientists should recognize and implement useful intermediate steps, that will ultimately result in a neuromorphic revolution.

At the time that this paper was written, there was an international collaboration underway that wanted to capture the image of a black hole’s Event Horizon for the first time in history. But the angular resolution needed was an order of magnitude lower than what was available with current technology. What does this mean?

We know that there are billions of stars in the sky. However, stars that are very far away don’t appear as distinct stars to us. Instead, they appear to be broiling in a hazy cosmic soup. This is because the naked eye is not very good at distinguishing between relatively close stars that are very far away.

When light rays from far away stars enter the tiny apertures in our eyes, they undergo diffraction. For close enough stars, these diffraction patterns merge into one amorphous blob. Hence, instead of distinct stars, we just see a cosmic haze.

The Rayleigh formula for angular resolution is , where is angular separation, is the wavelength of the incoming signal and is the diameter of the aperture (of the human eye or a telescope). Given that the observed wavelength and the diameter of the aperture are fixed, if two stars are separated by an angle of at least , they can be seen as distinct entities. If they have an angular separation that is smaller than that, they will be seen as one hazy blob.

How does all this relate to taking the photo of the event horizon of a black hole? Imagine that the event horizon subtends for an angle . If this , then we will be able to take an image of a black hole that has at least two distinct points. Hence, the event horizon will appear to be an elongated object instead of just a dot against the night sky. This is what we mean by taking photo. When we talk about taking a photo of Jupiter, we refer to a photo containing details of its stripes, the great red spot, etc, and not just a pale dot in the night sky. We hope to accomplish a similar feat with black holes (this hope was obviously fulfilled in 2019).

Now as black holes are very far away, the angle that they subtend might be very small. Hence, keeping constant, we will have to have a a huge aperture diameter. For example, it is predicted that the black hole at the center of the Milky Way has an angular diameter (relative to viewers on Earth) of radians. For , we will have to build a telescope of diameter 13,000 km to resolve it. This of course is not realistic. To overcome this constraint, the authors plan on using multiple telescopes spread over the world, and then use their combined data to emulate the data collected by one giant telescope. Using multiple telescopes in this manner is known is Very Long Baseline Interferometry (VLBI). Hence, VLBI can solve our problems of angular resolution. But what about the post-VLBI stuff? What about the creation of the actual image? There exist multiple algorithms to produce images from given data, with varied degrees of clarity.

The authors content that the **Continuous High-Resolution Image Reconstruction using Path Priors**, or **CHIRP** algorithm, can reconstruct better images than other existing algorithms from the same data.

Before we talk about the CHIRP algorithm, we need to talk about the VLBI process so that we understand the kind of data that we’re collecting. As explained before, in order to be able to get a better angular resolution, multiple telescopes are needed. Such an array of telescopes is known as an interferometer. This proves problematic though. Two telescopes located at different places will receive different signals: signals that are different in terms of phase, etc. How can be combine all of these observations to produce one coherent signal then?

Here’s how we do it: fix a wavelength amongst all the incoming wavelengths. This wavelength is composed of very many spatial frequencies. Spatial frequency can be thought of as the number of light and dark lines in unit distance, when a light beam is diffracted by a particular apparatus. Look at the diagram below, where SF stands for spatial frequency

One may imagine that the incoming signal is the sum of waves of very many different spatial frequencies. We calculate to what extent the two telescopes under consideration receive waves with the same spatial frequency. Another way to look at it is we measure to what extent the two telescopes receive the “same” wave (with the same frequency, etc). This is calculated by the correlation function, which is defined as

This formula comes from the Van Cittert-Zernike Theorem, and the Wikipedia article on it is extremely well-written. Here the telescopes are numbered and , and their correlation function (also known as visibility) is denoted as . Just some notation- denotes the angular position of the celestial object, and denotes the position vector of one telescope with respect to another. The basic gist of the Van Cittert-Zernike theorem is this: if you consider any two stars (or even different points on the same star), they emit electromagnetic waves with different properties like spatial frequency, etc. Hence, the waves emitted by them are not “coherent” at any two points. An analogy is that if you drop two pebbles into water, at points close to the pebbles, different points on the water oscillate differently. Some show oscillate wildly, whilst others don’t oscillate at all (destructive interference). However, points that are a little further away from the two pebbles oscillate similarly. They show the same behavior. Similarly, points that are far away from these non-coherent stars will receive waves of similar properties. The extent of similarity, which is definitely much greater than , is given by the formula above.

Now note that if we have telescopes, we can form pairs of telescopes. Hence, we have visibility numbers. These numbers change due to relative atmospheric conditions, as the atmosphere introduces a delay in the phase of the incoming waves, and it introduces a different delay for each telescope as atmospheric conditions are different at different points on the planet. Hence becomes . However, if we have a set of three telescopes $i,j,k$, then the product remains unchanged, even after the phase delay introduced by the atmosphere. We refer to this triple product as the bispectrum (I will refer to the plural as bispectra). This is a useful invariant to have, as we don’t have to wonder about atmospheric delays anymore.

This introduces a number of constraints into the system, and hence reduces other independent additional constraints that we impose on the system. Why? Because we have independent variables of the form , and these independent variables can vary arbitrarily (because of the delay introduced by the atmosphere, for instance). However, we now have constraints. If we can only have fixed number of independent constraints on the system so that it still remains well-defined, then we have to reduce the number of other additional constraints that we can impose on the system.

Let’s back up. It seems that we have a system of equations in the variables , and we want to find these variables. But why do we want to find these variables at all? How will these help us reconstruct the image of a celestial object? It seems that with only this information, the authors have an algorithm to construct an image of the source. This is incredible, and I will try to provide more details below. Anyway, from this point we will assume that we have determine all the visibility () and bispectra ( values).

What algorithm should we now use to construct an image from the given data? Some known algorithms are given below:

**CLEAN**– This algorithm assumes that the image is made up of discrete bright points. It finds the brightest point on the image, and then “deconvolves” around that location. Deconvolution can be thought of as getting a sharper image from hazy data. After a few iterations, the brightness of the original source is reduced. Because this algorithm focuses on the brightest points, it often produces bad images of extended objects with no particular brightest points.

**OPTICAL INTERFEROMETRY**– We will talk about two algorithms here- the BiSpectrum Maximum Entropy Method (BSMEM), and SQUEEZE. Both are capable of using bispectrum data to construct images. BSMEM uses a Bayesian approach to image reconstruction, which may be thought of as determining what is the most probably “actual” image of the source, given the collected data. SQUEEZE on the other hand moves a set of point sources around in the Field of View, and then takes an “average” of the images obtained.

The authors propose their own algorithm CHIRP, which produces superior images than the algorithms mentioned above, with minimal human intervention.

Let us think back to the formula

It is clearly impossible to calculate for different values of , as the source is far away, and the received signal is hazy and impossible to resolve. However, the visibility and bispectra values have been determined above. Using these values, we determine the value of $lambda I_{\lambda}(l,m)$ using the inverse Fourier transform (with an intermediate step in which we discretize . More details to be given shortly.

The authors parametrize a continuous image using a discrete set of parameters. Having a continuous image reduces modeling errors during optimization. Now remember that we have the Fourier transforms of the original signal for each pair of telescopes (the visibility values). Because the signal is the weighted sum of these Fourier transforms, we can represent the original image as the sum of these continuous, scaled and shifted pulses (although the weights of these pulses, and hence the original image, are unknown at this stage). For a scene defined in the angular range and , we parametrize the space into scaled pulse functions, , centered at the geometric center of each cube of the grid. The numbers are for us to decide, depending on how much accuracy is desired. We use known triangular pulse functions here, and hence their Fourier transforms are known. The weights that we assign to each pulse function can be written down as or .

What we’ve essentially done is that we’ve located the (angular) coordinates across which the celestial body extends, zoomed in on those coordinates, drawing up a grid across this space, and then drawn pulse functions inside each cube of this grid. Why have we done this though? We want to convert the difficult calculation of integration into the relatively simpler calculation of a linear matrix operation. In other words, we can calculate from the values of using just simple linear algebra. However, we need in order to do that, and we calculate the value of below.

We need to construct an image with the values of the bispectra that we have. We use a maximum a posteriori (MAP) estimate. What this means is the following: consider the vector as representing the weights of , as described above. If the observed observed bispectra (used instead of visibility because these values don’t change with atmospheric conditions) is denoted as , then we need to minimize the energy . Clearly here is fixed, and we vary the values of $x$ to find the optimal one.

The issue at hand is slightly more subtle than I let on. and can only be calculated if we already know what is. However, at this stage, we clearly don’t. Hence, what is probably happening is an iteration: we start with an image that we obtain after patching together information from the patch prior (hence the value of is known). We use this value of to determine and . Using these, we now calculate a better value for $x$ by minimizing the function , and so on. After having found an optimal value for , we can use it to reconstruct the image. In practice, we don’t minimize this function directly, but use a slightly different algorithm that gives us the same result. This algorithm is explained below:

We use the “Half Quadratic Splitting” method. Start with a blurry image, with a known value of . Use this value, and the value of a varying parameter , to calculate the most likely patches , which may be thought of as the most likely sub-images in the grid that can be overlapped and put together to form the complete image. This information is now used to calculate a better value of . This value can now again be used to calculate the most likely patches , with a different value of $\beta$. The values of for which we run this iteration are . After this process is completed, we will have gained a “good” value of , using which we can construct .

So does the CHIRP algorithm perform better than SQUEEZE, BSMEMS or other algorithms? Let us look at the pictures below.

Note that the data, based on which these images were created, are a result of simulations, and not actual observations. The authors made their very extensive and useful dataset available here.

The CHIRP algorithm was also found to be more robust to random (Gaussian) noise.

What effect did patch priors have? Patch priors allow for the algorithm to train itself on what the image “probably” looks like, given the input data . Patch priors helped the algorithm to piece together a much better picture of celestial bodies.

The authors contend that CHIRP is a better image processing algorithm than other state of the art algorithms. And they did prove it with a magnificent picture of the event horizon of a black hole 4 years later, for all of eternity to stare at with moist eyes.

So, why indeed are plants green? One might say it is because of chlorophyll, which is green in color. But why is chlorophyll green? Of course, one answer to this question is “because it is green”. But a more satisfying answer is that chlorophyll being green ensures that plants receive a steady, not-wildly-fluctuating supply of energy.

A plant, or at least the light harvesting parts of a plant, may be thought of as an antenna. The extremal points of a plant absorb energy from sunlight, and transfer this energy via various pathways to the parts where this energy is needed in order to make food and sustain other life-giving activities. Sunlight contains a spectrum of wavelengths, and plants probably want to absorb all wavelengths in order to maximize their intake. However, absorbing the green frequency would lead to a lot of variance in the amount of energy absorbed. Hence, to reduce this variance, plants just reflect this green part of the solar light, and absorb the red and blue parts.

And that is why plants appear green.

Ensuring that the energy input into a network or grid is equal to the energy output is a fundamental requirement of networks. If excess energy is absorbed, it may destroy the system, and if not enough energy is absorbed, an underpowered system will soon shut down. However, the environment of a plant can vary rapidly with time. The sun can be covered by clouds, the plants above a light absorbing leaf may sway with the wind and hence block access to sunlight at intervals, etc. How can a plant ensure that it receives a steady supply of energy? Clearly, much like we need a constant amount of food everyday, a plant’s energy output to its food making parts needs to be constant in order to survive.

If the energy absorbed by a plant at a fixed point in time can be plotted on a graph, with different probabilities given to different amounts of energy absorbed, the greater the variance in this graph, the more the variance in energy absorbed by a plant. This variance, which is called **noise**, should be reduced. Reducing noise is going to be our main motive as we do the analysis below. Methods of reducing noise like adaptive noise filtering require external intervention, and hence are not available to plants.

Imagine a network with a single input node , that absorbs light at wavelength with (maximum) power . Note that the average power absorbed does not have to equal due to changing external conditions like swaying plants blocking sunlight, etc. Hence, the average power absorbed by a plant is , where , and can be thought of as the probability of the plant absorbing .

Let the energy output be . If , then the average energy input, which is , would always be less than output. Hence, in this model of only one input node , so that the average energy input is equal to output. In other words, .

Let us now calculate the variance of the energy received. If is the probability that the plant is able to receive energy, and is the probability of the plant receiving energy, then the variance is . This can be simplified as

We should look for ways to reduce this variance. We can do this by having two nodes instead of one.

Let us now have a network with two input nodes, and see if we can reduce variance. Let the input nodes and absorb light at frequencies . Let the power absorbed be with probabilities . We will assume that . Also, we want , as the average power input should be equal to the output. One constraint of the system is that the plant shouldn’t absorb power, because . Hence the possibilities of power absorption are that the plant absorbs power, power, or power. The variance of the model is now . This can be simplified as

Clearly, this is smaller than the variance we get from the network with just one node. The good news is that plants also have two input nodes- chlorophyll a and chlorophyll b. The presence of two input nodes for a given wavelength probably served as an evolutionary advantage in order to minimize noise in energy absorption.

We want plants to absorb energy at a steady rate, to ensure that energy input=energy output. We want to maximize , where , so that the **noise**, or variance in energy absorption, is minimized. Our constraint is that . And we want to do this for all the wavelenghts of light that we can.

Now the maximum power available depends upon the sunlight available, and is given below as the black curve in the graph. Ignore the two peaks for now.

Hence, we can ideally select two nodes each for the blue, green and red regions of the wavelength spectrum, and absorb energy from each of them. In order to reduce noise however, we need to maiximize . This can be done if we place two nodes each in each of the three regions, and let the two nodes have very similar wavelengths, but different ‘s. This can be done easily where the slope of the irradiance is high. We can see in the graph above that the slope of the irradiance graph is high in the blue and red regions. However, in the green region, the slope is close to . Hence, if we place two nodes there with similar wavelengths, will be almost $0$, and hence there will be a lot of noise in the energy input.

This is the reason why plants have two nodes each only in the red and blue regions of the light spectrum, and not the green region. The green light is reflected, and this is why plants are green.

Purple bacteria and green sulphur bacteria can be modeled using the same constraint of reducing noise in energy absorption. Hence, the scientific model developed by the authors is robust, and can explain the color of much of the flora and fauna found on the planet.

When I was a school student in India, I often came across JC Bose’s claims of plants being sentient beings, having nervous systems, etc. However, these things were never part of the official curriculum (i.e. we never had to learn these things for tests). Bose’s statements in this matter have always been considered to be something of a not-completely-scientific, “the whole universe is connected by prāna”-type of claim by the larger scientific community. This paper assets that despite initial rejection, most of Bose’s claims have been proven to be correct by modern science in recent times.

By the time Bose retired from Presidency College, he was a world renowned physicist who was known to have studied radio waves even before Marconi (although the primacy debate is a complex one, there is evidence to suggest that there were scientists in Europe who had studied radio waves even before Bose). After retiring, Bose started working at Bose Institute (which he founded), and guided by his “Unity of Life” philosophy, started studying the effect of radio waves on inorganic matter. Finding their response to be “similar to animal muscle”, he now started studying plant physiology (the nervous system of plants). He would expose plants to various stimuli, and record their response through the use of ingenious instruments that he himself designed. His conclusion was that the nervous impulses of plants were similar to those of animals.

Bose studied both large plant parts and individual plant cells. He would connect microelectrodes to these cells, and record their response to stimuli. He concluded that plants contain receptors of stimuli, nerve cells that code these stimuli electrically and propagate these messages, and also motor organs that purportedly helped in carrying out a response to the stimuli. In this, he concluded that plants and animals have similar nervous systems. Bose said that the nervous system in plants was responsible for things like photosynthesis, ascent of sap, response to light, etc.

Bose said that the action potential of plant neurons follows the unipolarity of animal neurons. But what is Action Potential? This is an amazing video explanation of what it is. Action Potential is the electric potential via which neurons transmit messages. In resting state, the electric potential difference between the inside and the outside of neurons is -70 mV (the inside is negatively charged). When neurotransmitters activate the neuron (because a message is to be passed), this negative potential difference is destroyed by a stream of positive sodium ions that comes into the neuron from the outside. This causes lots of changes to the neurons, including inducing it to release neurotransmitters to activate the next neuron in line. The electric potential difference becomes positive, and then becomes negative again because the neuron loses a lot of potassium ions to the outside. The sodium-potassium pump on the cell membrane expends energy to exchange sodium and potassium ions to ensure that the neuron returns to its previous state before it was excited. Thus, the neuron enters the resting state again. This is the chemical mechanism by which a neuron conducts a message.

Where can one find the “nerves” of plants? Bose localized the nervous tissue in the phloem, which conducted both efferent and afferent nervous impulses. He also measured the speed of the nervous impulse, which he found to be 400 mm/sec. Although Burdon-Sanderson and Darwin had previously reported on nerve impulses in insectivorous plants, Bose’s studies over the next three decades were far wider and deeper. Although ignored after the 1930s, his studies have been found to be correct by modern experiments. The author claims that Baluska et al have not only confirmed Bose’s major findings, but have also advanced these further utilizing molecular biology, genomics, etc. Baluska seems to have published this paper in a journal that he himself is the editor of. Hence, these claims perhaps need to be investigated further.

Along with Action Potentials (APs) (common to plants and animals), Slow Wave Potentials (SWPs) or Variation Potentials (VPs) (found only in plants) are also used by plants to transmit nerve impulses. These SWPs do not propagate electrically, but by hydraulic pressure exerted by tissues found in the xylem. Some plants like Dionaea flytraps were found to possess unidirectional APs similar to those found in cardiac myocytes (cadiac muscle cells). This prompted Bose to poetically state that plants possess hearts that beat as long as they live.

At the molecular level, plants possess voltage gate channels (membranous proteins that are activated by change in electric potential and allow the exchange of ions), a vesicular trafficking apparatus (for the transport of proteins and other molecules within the cell plasma) etc, all of which are also found in animal cells. Trewavas also observed that water soluble ions were responsible for intra-cell communication, and also inducing changes in plants as a response to environmental conditions. We now know that there exist many such water-soluble (these are called cystolic) messengers in plants, as they do in animals.

Darwin had pointed out that the tip of the radicle (found in the roots) is endowed with sensitivity, and also directs the movements of adjoining parts. In this, it is like the brain.

Bose elaborated on this by saying that the radicle is stimulated by friction and the chemical constitution of the surround soil. The cells undergo contraction at appropriate times, causing their liquid contents to go up. This causes the ascent of sap. Baluska et all carried these claims even further, and stated that within the root apex of the maize plant, there is a “command centre” which facilitates the long distance travel of nervous impulses, and instructs the plant to move towards positive stimuli (and away from negative stimuli). Tandon rejects the notion that such a command centre is anywhere near as complex as an animal brain.

Bose found the nerve cells in plants to be elongated tubes, and the dividing membrane between them to be the synapse (the gap between animal neurons where messages are transmitted between neurons). This claim has been substantiated by Barlow, who said that plant synapses share many characteristics with animal synapses. Plants also use many of the same neurotransmitters as animals like acetylcholine, glutamate and -aminobutyric acid.

Bose claimed that plants are intelligent, have memory, and are capable of learning. Tandon makes the claim that Trewavas describes a large number of protein kinases in plant neural pathways, and hence finds their nervous system to be similar to that of animals. On skimming Trevawas’ paper however, I mostly found it to say that although there do exist protein kinases in plants, the neural systems found in plants differ from that of animals in important ways.

In another paper, Trewavas claims that one important difference between plant and animal nervous systems is the timescale of response- plants respond much more slowly to external stimuli. Hence, we need time scale photography to properly study the plant neural response. Also, if intelligence can be thought of as “a capacity for problem solving”, then plants show signs of intelligence as they change their architecture, physiology and phenotype in order to compete for resources, forage for food, and protect themselves against harsh elements of the environment.

Barlow substantiates these arguments, claiming that plants rapidly convert external stimuli to electrochemical signals, which cause them to change their physiology. He also claims that plants do have memory, as their decision making would involve recollection of previously stored memories. Barlow also says that roots experience four stimuli at once (touch, gravity, humidity and light), and have to decide how to obtain the optical mix of all. Hence, plants do possess the decision making aspect of intelligence.

With regard to plant intelligence, Baluska et al make the following claim:

‘Recent advances in chemical ecology reveal the astonishing communicative complexity of higher plants as exemplified by the battery of volatile substances which they produce and sense in order to share with other organisms information about their physiological state”

Gruntman and Novoplansky from Israel also claim that *B. Dactyltoides* are able to differentiate between themselves and other plants, and if a plant has multiple roots, each set of roots identities the other as belonging to the same plant (and hence these roots don’t compete with each other for resources). But how do plants recognize themselves and others? The authors claim that this is from the internal oscillations of hormones like auxin and cytokines. The frequency of this oscillation is unique to each plant, and can be measured externally by roots.

Bose claimed that

“these trees have a life like ours……they eat and grow…….face poverty, sorrows and sufferings. This poverty may……induce them to steal and rob…….they help each other, develop friendships, sacrifice their lives for their children”

The author finds that this sentiment is not yet fully supported by the scientific data collected by Bose. However, these claims may be further ratified when more experiments are done in this realm.

Bose single-handedly created the field of Plant Neurobiology. Although the establishment of this field has its opponents, even the most vocal of these opponents cannot find fault with any of Bose’s scientific claims. The author hopes that plant and animal neuroscientists communicate better with each other in the future, and find the time and resources to study this field more. Hopefully, such studies will ratify even more of Bose’s revolutionary ideas and claims.

“Can machines think?” We soon realize that we first need to define the terms “machine” and “think”. Let us set the scene: there are three people- A (man), B (woman) and C (interrogator). None of them can see each other or speak, and they can only communicate through typewritten messages. The interrogator (C) has the job of determining the genders of A and B (he doesn’t know that A is a man and B a woman). C can ask them any number of questions. A’s job is to mislead C into thinking that he is a woman. B’s job is to convince C that she is the woman, and that A is a man. Both A and B can lie in their answers to C.

Clearly, lying and misleading are traits of human beings, and require thought. Suppose we replace A (man) by a machine. Will the machine be as convincing a liar as A? Will it be able to dodge C’s questions as skillfully as A? If the probability of C calling out the machine’s lies is less than or equal to the probability of him/her calling out A’s lies, then Turing says that that implies that the machine can “**think**“. Testing whether someone or something can “think” would involve having them accomplish a task that would require considerable thought and planning. Lying and misleading clearly fall into that category. Without going into the metaphysics of what exactly “thought” is and why inanimate objects *can’t* think because they don’t own big squishy brains like us, I think this is a good definition of what it means for a machine to think.

What is a “machine” though? In order to do away with objections like “humans are also machines” etc, Turing says all machine that are considered in this paper are “digital computers”. This may sound like a very restrictive definition. However, he says that digital computers, given enough memory, can imitate all other machines (and are hence universal machines). This claim will be partly proved below. Hence, if we wish to prove that that there exists some machine capable of a certain task, it is equivalent to proving that there exists a digital computer capable of that task.

One of the more fascinating parts of this section is where Turing describes how the storage of a computer would work, and how a loop program works. He then gives an example of a loop in human life, which is what a loop program is meant to imitate:

To take a domestic analogy. Suppose Mother wants Tommy to call at the cobbler’s every morning on his way to school to see if her shoes are done, and she can ask him afresh every morning. Alternatively, she can stick up a notice once and for all in the hall which he will see when he leaves for school and which tells him to call for the shoes, and also to destroy the notice when he comes back if he has the shoes with him.

Although fascinating, the details of such functions are quite well known, and hence I won’t elaborate more on them here.

A “machine” is a pretty broad term in general. It has certain states, and rules for how the machine will behave when it is in that state. For instance, the universe may be thought of as a machine. Its initial state was that it was very very hot at first. God’s Rule Book said that in such a state, the universe had to expand and cool down, and that’s exactly what it did. The universe could be described as a continuous state machine- if its initial state was, say, even one millionth of a degree cooler, it would have evolved very differently. Hence, we can say that the universe is a machine is that **very sensitive **to initial conditions, and hence is a **continuous state machine**.

The opposite of a continuous state machine is a **discrete state machine**. This is a machine that is not very sensitive to initial conditions. An example would be your stereo system. Imagine that you have a stereo system with one of those old fashioned volume knobs that you turn to increase or decrease the volume. It won’t make that much of a difference to your audio experience if you turn the knob slightly more than you intended to. 70 dB will sound almost the same as 71 dB. Hence, what a lot of companies do is that they break up the volume level on your stereo into discrete states. Because 70dB is also the same as 71dB, they can both be clubbed into the volume level “15”. When you turn the volume knob, and see the volume bar in the visual display go up by 1, you’ve moved up from one state to the next. In this way, your stereo system is clearly a discrete state system.

Digital computers are discrete state machines. Discrete state machines are stable to small changes in initial conditions, and hence it becomes easy to predict their behavior in the future, based on knowing their initial state.

Turing then makes a fantastic prediction in this paper:

..I believe that at the end of this century…one will be able to speak of machines thinking without expecting to be contradicted.

Turing claims that this is only a conjecture, but that making conjectures is good for science. This is one conjecture that is perhaps still hotly debated across the world, and “verify you are a human” tests on many websites regularly play *The Imitation Game *with us.

Turing anticipates many objections to his claim that machines can “think” if they can mislead other humans. And he deals with these anticipated objections one by one.

The *Theological Objection* might be that God provides immortal souls capable of thinking only to humans, and not animals or machines. Turing contends that “we should not irreverently usurp his power of creating souls”, and that He might indeed have provided all objects, animate and inanimate, of souls capable of thought. Turing also clarifies that religious arguments don’t impress him anyway. *The Head in the Sand *Objection contends that contemplating machines having the capacity for thought is too scary, and hence we should assume that they will never gain this facility. Intelligent people, basing their superiority on their capacity for thought, often make this argument. Turing thinks that this argument is too weak to even argue against, and that such “intelligent people” just need consolation.

*The Mathematical Argument* against machines being able to imitate humans well is that every machine is based on a certain logical system. Gödel proved that to every such logical system, you can ask a Yes/No question that it can only answer incorrectly, or never be able to answer. As humans (namely C, in this context) don’t have such a handicap, we can easily find out whether A is a machine or not. However, knowing what this question might be is going to be near impossible, as we wouldn’t know what type of a machine A might be (even if we are fairly sure it is a machine). Hence, we might now know what question to ask for which the machine will falter. Moreover, it is also entirely possible that humans would also give the incorrect answer to this question (as it is likely to be a tough logical question). In addition to this, what if this one machine is actually a collection of digital machines, with different such “trap” questions for each? It is going to be tough finding a question for which all such machines will fail. Therefore, this mathematical objection, although valid, is not something we need to worry about too much.

*The Argument from Consciousness* says that machines cannot think unless they can write a sonnet, feel pleasure at their success, feel warm because of flattery, etc. Turing contends that there is no way of knowing whether machines can feel any of this unless we can become the machine itself, which is impossible. One way to ascertain the emotional content of a machine is to conduct a viva voce, in which answering the questions posed would require some amount of emotional processing. The example that Turing provides, which he expects a human and a machine to be able to conduct, is given below:

Lady Lovelace, who described in details *The Analytical Machine* that Charles Babbage designed, claimed that machines cannot do anything original, and can only do what they’re programmed to do. Turing contends that although the kinds of machines that Lady Lovelace could see perhaps led her to that conclusion, she was incorrect, and that even The Analytical Machine could be suitably programmed such that it could do “original” things. One variation of her statement could be that “machines cannot do anything new”. However, even the “new” things that humans do is inspired, at the very least, from their own experiences. Hence, an even better variant would be “machines cannot surprise us”. This is also incorrect, as humans often make calculations that are slipshod, that lead them to certain conclusions. When they ask machines for answers, they’re often surprised with the (correct) answer that the machines provide. An analogy would be when we incorrectly calculate that preparing for a certain exam would take us one all nighter, and are surprised by how bad that plan was. We did not correctly calculate our productivity over the course of one night.

The *Argument from Continuity* says that the nervous system is a continuous state system, and a digital computer is a discrete state system. Hence, a digital computer can never successfully imitate a human being. Turing counters this by saying that a digital computer is capable of imitating continuous state systems. Take a differential analyzer for instance, which is a continuous state system. If asked for the value of the differential analyzer, might give any value between and , based on its current state. This is a feature of continuous state systems- even a small deviation from the “ideal” state brings about a noticeable change in the output. This could be imitated by a digital computer by having it output with probabilities of .

*The Argument from Informality of **Behavior* says that machines follow a rule book, which tells them how to behave under certain circumstances. Moreover, the behavior of a machine can be completely studied in a reasonable amount of time, such that we’ll be able to predict the behavior of a machine perfectly in any given situation. Humans are not predictable in such a fashion. Hence, humans and machines are different, and can be easily distinguished. Turing argues against this by saying that it is possible that such a “rule book” for humans may also exist, and the unpredictability of humans is just a result of the fact that we haven’t found all the rules in the book yet. Moreover, he says he has written a program on a relatively simple **Manchester** computer, which when supplied with one 16 digit number returns another such number within 2 seconds. He claims that humans will not be able to predict what number this program returns even if they get a thousand years to study the machine.

One of the more interesting sections of the paper is where Turing says that if player B is telepathic, then this game would break down, as she (player B is a woman) would easily be able to pass tests like ‘C asks “What number am I thinking?”‘. Clearly, a machine would be unable to think of this number with any degree of certainty. While Turing contends that Telepathy is indeed real, he overcomes this problem by suggesting that all the participants sit in “telepathy proof” rooms.

Turing says that in some sense, a machine can be made to become like a “super-critical” brain (something which is capable of devising plans of action after being given an idea), and that in another sense a brain is also a machine. However, these ideas are likely to be contested. The only proof of whether a learning machine can exist can be given only when one such machine is constructed, which for all purposes lies far ahead in the future.

But what are the constraints when one tries to construct such a machine? Turing says that a human brain uses about binary digits to process ideas and “think”. These binary digits can be thought of as analogues to neurons, which are when they fire, and when they don’t. A particular combination of neuron firing leads to resultant thoughts and actions. Turing thinks that the storage space needed for containing these many binary digits can be easily added to a computer. Hence, there is no physical constraint on constructing this device. The only question is, how do we *program *a computer to behave like a human being?

One could theoretically observe human beings closely, see how they behave under all possible circumstances, and then program a computer to behave in exactly that way. However, such an endeavor is likely to fail. One can instead program a computer to behave like a **human child**, and then make it experience the same things that a human child experiences (education, interacting with others, etc). Because a child is mostly a blank slate, and forms its picture of the world and behavioral paradigms based on its experiences, a computer may learn how to be a human adult (and hence imitate a human adult) in exactly the same way. Turing concedes that constructing a “child machine” is a difficult task, and that some trial and error is required.

A child learns through a “reward and punishment” system. What rewards and punishments can a teacher possibly give to a machine? A machine has to be programmed to respond to rewards and punishments much like a child does. It should repeat behaviors for which it is praised by the teacher, and not repeat behaviors for which it is scolded or punished. Also, it can be fed with programs that ask it to do exactly as the teacher says. Although inferring how the machine should behave based on fuzzy input from the external world might lead to mistakes, Turing contends that this is not any more likely than “falling over unfenced cliffs”. Moreover, suitable imperatives can be fed into the machine to further reduce such errors.

A critical feature of “learning machines” is that they change their rules of behavior with time. Suppose they have been programmed to go to school every day at 8 am. One day, the teacher announces that school will being at 9 am the next day. Learning machines are able to change the 8 am rule to 9 am. Adapting rules of behavior based on external output is also a feature of human beings.

How does a machine choose its mode of behavior though? Suppose the parents say “I want you to be on your best behavior in front of the guests”. What does that mean? There are a lot of near solutions to this problem- the machine could sit in a corner silently throughout the party, it could perform dangerous tricks to enthrall and entertain the guests, etc. How does it know which of these solutions is best? Turing suggests that we assign the machine a random variable. Let us suppose that all the “solutions” mentioned above are assigned numbers between and . The machine could pick any number randomly, try out the solution corresponding to that number, evaluate the efficacy of that solution, and then pick another number. After a certain number of trials (maybe 15), it could pick the solution that is the most effective. Why not pick all numbers between and ? Because that would take too much time and computation.

By the time the machine is an adult, it will be difficult to accurately predict how the machine will behave in a certain situation, because we cannot know all the experiences that the machine has been through, and how its behavioral patterns have evolved. Hence, if we program something into it, it might behave completely unexpectedly- it might “surprise us”. Also, because its learned behavior and traits are unlikely to be perfect, it is expected to behave less than optimally in many situations, and hence mimic “human fallibility”. For all purposes, our machine will have become a human.

What is incredible to me is that this is exactly how neural networks behave. It is insane that Turing could envision such a futuristic technology more than half a century ago. Thanks for reading!