Dusting an old Math Theory to Take Machine Vision to the Next Level

What could be the link between machine vision and a supposedly smart horse?

Clever Hans or der Kluge Hans, was a horse raised in Germany between 1895 and 1916. The animal was claimed to possess superior intelligence and became famous among the public and scientists around the world.

By tapping its hoof on the ground, Hans was able to add, subtract, multiply, divide and do many other arithmetic and intellectual tasks. Several scientists at the time investigated the “intelligence” of this controversial horse.

Then in 1907, Oskar Pfungst, a German psychologist, discovered what would be known in psychology as the Clever Hans Effect.

It turned out that Clever Hans was not really that clever. The horse only interprets the unconscious body language cues sent by its master or those around it, signaling it when to stop tapping.

Outside psychology, we can find patterns of the Clever Hans phenomenon in artificial intelligence, especially deep neural networks powering machine vision.

Novel Math for Enhancing Artificial Vision Machines

AI systems are getting pretty good at recognizing images, as well as objects and textures inside images, with ever-increasing efficiency and accuracy.

Machine learning systems work on neural networks designed based on biological neurons, which aim to mimic the human brain. In the case of machine vision, neural networks excel at pattern recognition by training on large datasets of images.

But like with Hans the clever horse, “intelligence” in these deep learning systems seems to just emerge with no real human control over it. Mattia Bergomi, a mathematician and a neuroscientist working in the Systems Neuroscience Lab at the Champalimaud Centre for the Unknown (CCU) in Lisbon said:

“[Clever Hans effect]… It’s the same with machine learning; there is no control over how it works or what it has learned during training. The machine having no a priori knowledge of faces, it just somehow does its stuff – and it works.”

Bergomi is a member of a group of mathematicians who wanted to answer this question:

“Could there be a way to inject some knowledge of the real world (about faces or other objects) into the neural network, before training, in order to cause it to explore a more limited space of possible features instead of considering them all – including those that are impossible in the real world?”

If you want your neural network to tell different cats apart, and recognize them among other animals, you’ll need thousands of photos capturing cats in all imaginable positions. Then, the neural network systems will learn on their own.

But, it’s a fact that “nobody really knows what goes on inside [neural networks] as they learn their task. They are, basically, black boxes. You feed them something, they spit out something, and if you designed your electronic circuits properly, you’ll get the correct answer.”

Take self-driving vehicles, for example.

The deep neural network onboard these cars have to distinguish road signs. A task that would be much easier if it only focuses its training on simple geometrical shapes such as circles and triangles.

But how do you tell these neural networks to do that?

This is where a mathematical theory, called “topological data analysis” (TDA), comes in. The TDA theory was formulated in 1992 by Italian mathematician Patrizio Frosini, co-author of the present study.

Why Machine Vision Systems Need a Topological Sense

Unlike geometry which measures lines and angles in rigid shapes, topology extends to more complex objects, classifying them according to their shape. In topology, it’s only a matter of deformation — by stretching or compression between the shapes — for a donut and a mug to be considered as one object.

This topological sense is absent in deep neural networks powering current machine vision systems.

The mere rotation of the same object would make it totally unrecognizable to neural networks. That’s why their training involves the memorization of every possible configuration.

TDA allows avoiding this time-consuming and resource-intensive approach in the training of machine vision systems. The team showed that TDA could make artificial machine vision systems learn to recognize complex images “spectacularly faster.”

By recognizing topological features, neural networks would be able to identify complex objects without having to scan thousands of possible orientations. The team tested their math using a neural network with hand-written digits, and the results “speak for themselves.”

“What we mathematically describe in our study is how to enforce certain symmetries, and this provides a strategy to build machine learning agents that are able to learn salient features from a few examples, by taking advantage of the knowledge injected as constraints,” says Bergomi.

The results of the study are published in the journal Nature Machine Intelligence.