Aligning Machine Intelligence- MIRI
I am applying for a summer internship (or attendance at a workshop, or something like that) with the Machine Intelligence Research Institute (MIRI) at Berkeley, California. I have an interview tomorrow over Skype. In order to prep for it, I went on the MIRI website and read their mission statement and a paper outlining the various issues that researchers at MIRI try to address.
The paper in question is “Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda”. A link to the paper is here.
The paper says at the outset that MIRI does not concern itself with making more intelligent or more powerful artificial machines, as it is already an active field of research with multiple institutes doing wonderful work in that direction. It deals with the question of how to make the machines behave in a way that is aligned with human interests and expectation. Some features discussed in the paper are:
- Real-world models– Machines can be compared based on a method similar to scientific induction- expose machines to a complex environment, and then make them develop models of the environment. If their models can predict future events in the complex environment with a better success rate, then their models are better. This seems to be an effective way to compare machines. However, when machines are put to model the external (real) world, they themselves will be part of the environment. That makes their observations and conclusions questionable. For instance, if the machines are not water-proof. Then rainfall in the external environment will be termed as a catastrophic event, and the machine will spend more time and resources studying ways of avoiding rainfall, which does not align with human interests.
Moreover, as machines are rewarded based on the maximization of a reward function, they may outsmart their human judges by finding ways of maximizing the function without creating the best models of the complex environment. This is similar to students gaming the system by learning important sections of the textbook for the exam, without reading the full book and gaining a cohesive understanding of the material, as long as they can predict what types of questions will be asked.
- Decision Theory– Given a situation, what decision much a machine take? At the outset, it sounds easy. Make a list of all possible actions, see which action maximizes the utility function, and then select that action. However, it is not clear how a machine would be able to exhaustively check all possible outcomes of all possible actions, and then select the one that is “best”. Due to the varying degrees of such analyses that it can do, it is also possible that in two environments that are identical in every aspect, the machine chooses different courses of action. Making a reliable machine which takes the same decision every time after analyzing the consequences of each possible action thoroughly is a difficult problem.
- Logical Uncertainty– Humans understand that despite understanding a complex system arbitrarily well, it is often difficult to predict the events in the system. This is not because of lack of information, but because of lack of deductive reasoning skills. For instance, if you were shown all 52 cards of a deck in order, and then the cards were shuffled in ways that you understand, you’d still have trouble predicting which card is on top after a million fast shuffles. This is because the human mind is mostly incapable of making fast long calculations. In such circumstances, we assign probabilities to various outcomes, and then select the outcome which has the highest probability.
A similar situation is applicable to smart machines- they will face situations in which they will not not be able to predict events accurately. Hence, they will need to assign probabilities to outcomes too. The assignment has to be done in a way such that is maximizes the successful prediction rate. Teaching a machine how to do that is an active area of research.
- Vingean reflection– This has to do with creating smart machines that can themselves create even smarter machines without human intervention. Let us assume that we can create machines which weigh all possible courses of action, and select the one which serve human interests best. Hence, it follows the same procedure to create a smarter machine. However, because the machine it creates will be smarter than itself, the parent machine will not be able to predict all courses of action that the child machine would take (if it could, it would be as smart as the child machine, which contradicts the hypothesis). Hence, an aligned machine may create a machine that is not aligned with human interests.
- Error-tolerant agent designs– Currently, if a machine is malfunctioning, we can just open it up and correct its code, hardware, etc. However, an intelligent machine, even if it is programmed to listen to instructions if the human believes repairs are needed, may find ways of not listening if it has an incentive to escape such meddling. In other words, although the machine has to follow the code which instructs it to listen to its human programmer, it may cleverly find another part of the code or its set of instructions which allows it to escape. Programming an intelligent machine to listen to humans is a contrived problem.
- Value Specification– The way humans are programmed to procreate is that the act of sex itself is pleasurable. Hence, we have a natural inclination to procreate. Although humans are aware of this, they don’t try to change the pleasurable nature of sex. In a similar way, intelligent machines can be programmed to follow human instructions if they are rewarded on doing so. A sufficiently intelligent machine can figure out this rewards system, and may decide to change it. If the machine is no longer rewarded on following human instructions, it may soon go out of human control. Hence, programming machines to not change the reward system is an area of research.
Moreover, machines must inductively learn about human values and interests, as programming human interests into a computer is a fuzzy area at best. For instance, everything acceptable in human society is unlikely to yield an exhaustive list anytime soon, and hence cannot be fed into a machine. However, a machine may learn about society by observing it, and then base its actions on what is acceptable to humans. This is analogous to the fact that although not every cat picture in the world can be fed into a machine, it can inductively learn what a cat looks like by trawling the internet for cat pictures, and then identify an arbitrary cat picture based on its inductive learning.
I had a great time understanding and writing about the Artificial Intelligence concerns of MIRI, and hope to understand more in the near future.