Like every other idiot with an internet connection, I am fascinated by machine learning and neural nets. My favorite aspect of AI is image recognition, and I’ve written about it in the past. I am going to try and talk about it in reference to a book I’ve recently been reading.
The book that I’ve been reading is “The Master and the Emissary” by Iain McGilchrist. It is hands down the most amazing work I’ve come across in the recent past, and I plan to write a more detailed review on completing it. However, there is one fact that I want to flesh out below.
The main thesis of the book is that the left and right hemispheres of the brain are largely independent entities, and often process the world in conflicting ways. The left part of the brain recognizes objects by “breaking them up into parts and then assembling the whole”, while the right part of the brain “observes the object as a whole”. Clearly, the left part of the brain is horrible at recognizing objects and faces, and mainly deals only with routine tasks. The right part on the other hand is what we mainly depend on for recognizing things and people in all their three dimensional glory.
Anyone with even a cursory understanding of how neural networks (something something convolutional neural nets) recognize objects knows that neural algorithms mainly resemble the left side of the brain. Image inputs are broken up into small pieces, and then the algorithm works on trying to identify the object under consideration. Maybe this is why image recognition is bad (much, much worse than humans for instance)? How can one program a “right brain” into neural nets?
I don’t know the answer to this. However, it now seems clear to me that a lot of our approach to science and programming in general is based on a Reductionist philosophy- if we can break things up into smaller and smaller units, we can then join together those fundamental units and figure out how the whole edifice works. This approach has been spectacularly successful in the past. However, I feel that this approach has mostly served to be misleading in certain problems (like image recognition). What can be a possible roadmap for a solution?
The left and right hemispheres of the brain perform image recognition like this: the right brain processes the object in its entirety, and notices how it varies in relation to all other objects that it has seen before. For instance, when the right brain looks at you, it notices in what ways you’re different from the persons around you, and also from the other inanimate things in the background. The left brain now breaks up those images into smaller parts to notice similarities and differences, forms categories for “similar” things, and places all of the observed entities those categories. For instance, it places all the people in the “humans” category”, the trees in the background in the “trees” category, and so on. Hence, the right brain notices fine and subtle features of objects all at one go, and the left brain clubs objects together in a crazy Reductionist daze.
How would a neural network do “right brain” things? I’m tempted to say that there may be a lot of parallel computing involved. However, I don’t think that I understand this process well enough because it inevitably leads to the opinion that we should just have a bazillion parameters that we should try to fit onto every image that we see. This is clearly wrong. However, it does seem to me that if we’re somehow able to model “right brain” algorithms into neural nets, image recognition may improve substantially. More on this later (when I understand more about what is going on exactly).