# Vision and Cognition

Quite eventful, these past 3 months. I presented my first publication at the World Congress on Natural and Bio-Inspired Computing in December; attended the Microsoft Research India Winter school on machine learning and computer vision, followed by Kurukshetra, in January.

Some pics first – NaBIC 09, MSRI Winter School, Kurukshetra. Our NaBIC paper is here and our code is here. In short, we show that a simple iterative algorithm can find a very good photomosaic when compared to other population based methods like genetic algorithms. To induce some interest, here is one of our results.

Now that the pictures have spoken the thousand words……

# Human Vision

I have been a follower of Predicatbly Irrational after watching Dan Ariely’s talks ( here and here ) on Ted.com. Both talks are delightfully entertaining and demonstrate with wonderful examples the fallibility of our own decision-making capacity. For a while after each talk, I spent a lot of time probing into every decision I make, usually ending up almost undecided. As an undergrad who has to choose between a career and higher education – it couldn’t have come at a worse time. I was already in doubt whether I was applying to graduate schools simply because it was what the best among us were doing. Equally, I didn’t know whether it was simply cold feet/cowardice or the safety of a job at Amazon India that was making me have second thoughts about higher education.

Coming back; Dan starts off by using visual illusions as a metaphor. Here’s an example:

Take a look at the image on the left. It seems absolutely impossible that squares marked A and B are the same. They seem to be opposing colors, but in fact they are exactly the same. The proof is on the image to the right. `What is gong on here? How can it be that we see wrong? How can it be that even after we are shown that these two patches are identical we still can’t see them accurately when look left again?`

Dan’s ( more colorful ) example is here. Quoting from his blog:

Now, vision is our best system. We have lots of practice with it (we see many hours in the day and for many years) and more of our brain is dedicated to vision than to any other activities. So consider this — if we make mistakes in vision, what is the chance that we would not make mistakes in other domains? Particularly in domains which are more complex (dealing with insurance, money, etc.), and ones in which we have less practice? Domains such as decision making and economic reasoning?

So a few thousand years of evolution of the visual cortex ( which by the way is the largest part of the human brain ) and eyes has given us a visual system that can’t even see the world for what it truly is even; after it is explicitly demonstrated? Not exactly…..,

# Gestalt Theory

One of the plenary talks @ NaBIC was on Gestalt psychology. Gestalt in German means – shape of an entity’s complete form. The principle behind gestalt theory is that out brain is very holistic and understands more than what the sum of the parts indicate. The concept itself is somewhat difficult to put down in words, but a few examples expose some interesting aspects of our cognition system. For example, take a look at the following picture:

If at first you cannot make sense of the image, take a second to look at it before continuing with the text. The picture demonstrates `emergence `. After a while the scene with the dalmatian dog sniffing the path emerges. One can even make out the fallen leaves, the crossroads and the trees in the background. We do not recognize the dog’s body parts and then put them together to form the concept of the dog! Instead we perceive the dog and then make sense of its parts. The gestalt psychology theory gives only a description of this phenomenon and does not provide any explanation as to how we do it ( and it has been zinged a lot for that ).

Reification

Another phenomenon is called `reification`. It is when we understand more than what we see. Look to the right. There is no triangle in A, but we perceive it, no rectangle in B, but we see it. We can sense the presence of a sphere in C and the (absent) surface of water over which the snake glides!

Multistability

Before this post becomes a copy of the wiki page, one last example. This phenomenon is called `multistability`. As you keep looking at the images to the left, you keep shifting between two interpretations of the same image. The first image is the necker cube.

You can read more on Gestalt in its wiki page. For reviews of technical articles, see here.

There is already some criticism about the traditional statistics based approach to Computer Vision, and it seems almost impossible for any system now to truly replicate any of the phenomenon demonstrated above. But, there have been some attempts.

It is said that almost every branch of science follows a steep sigmoid. And there was consensus at the winter school ( which included Jitendra Malik, Yan LeCunYair Weiss, Martin Wainwright, Bruno Olhaussen, Richard Zemmel, William Freeman and Manik Varma — I am mentioning only the CV people here ) that computer vision is currently only at the bottom end on the verge of the steep rise; and that fifty years from now the best computer vision systems could be based on a completely different set of fundamentals.

# Understanding Motion

Federer in action

Point light display

The next level is understanding videos, or at a least of sequence of images. But can’t we understand the action from a single image? Sure, if its an image like the one on the left, it says a lot about the action being performed and precludes the need to understand image sequences. But what about the image on the right? Umm., not so sure. Looks like a human, but its not trivially understood what it is or what it is doing.

Watch the related videos on youtube.

That makes it a lot clearer. Dr. Malik presented a video on the research work of one Dr. Johansson from the 1970s ( I have not been able to locate the original video. Its not on youtube). There has been a lot of work after that trying to understand our pre-disposition to recognize biological motion. Apparently babies only three months old can perceive as much. So can other mammals!

Some pages on biological motion detection:

If you find my posts interesting, you can follow me via RSS here. Stumble this here. Some of my shared items are here, in case you find it interesting.