Technology 3 min read

Researchers Train Machine Learning Model to Reason Abstractly

Kevin Ku /

Kevin Ku /

One defining feature of human intelligence is the ability to reason abstractly about events around us.

It doesn’t involve a conscious effort to know that crying and writing are both means of communicating. Similarly, we instinctively understand that an Apple falling off a tree and a plane landing are descending variations.

Machines, on the other hand, are still learning to organize the world into such abstract categories. Recent studies have gotten closer to training machine learning models about everyday actions.

A team of researchers presented one such study at the European Conferences on Computer Vision this month.

The team unveiled a hybrid language-vision model that can compare and contrast a set of dynamic events captured on video. Then, the model would tease out the high-level concept that connects the events.

In a statement, the study’s senior author and a senior research scientist at Massachusetts Institute of Technology, Aude Oliva, said:

“We show that you can build abstraction into an AI system to perform ordinary visual reasoning tasks close to a human level.”

There’s more.

Training Machine Learning Model to Reason Abstractly

Deep neural networks have gotten significantly better at recognizing objects and actions in photos. Researchers are now focusing on a new milestone — abstraction and training models to reason what they see.

To achieve this goal, the researchers leveraged the links in the meanings of words to give the model visual reasoning power.

Research scientists at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Mathew Monfort explained:

Language representations allow us to integrate contextual information learned from text databases into our visual models. Words like ‘running,’ ‘lifting,’ and ‘boxing’ share common characteristics that make them more closely related to the concept ‘exercising,’ for example than ‘driving.’

The team used WordNet, a database of word meanings, to map each action-class label’s relation in their dataset.

For example, they linked words like “sculpting,” “carving,” and “cutting” to higher-level concepts such as “crafting,” “cooking,” and “making art.” So, when the model recognizes sculpting activity, it can pick out conceptually similar activities.

In a test, the model performed as well as humans at two types of reasoning tasks — sometimes even better. These tasks are:

  • Picking the video that conceptually completes a set
  • Identifying the footage that doesn’t fit

For example, after viewing the video of a dog barking and a man howling beside the dog, the model selected a crying baby to complete the set. What’s more, it picked that specific video from a group of five.

It’s a rich and efficient way to learn that could eventually lead to machine learning models that can understand analogies and are that much closer to communicating intelligently with us,” Olivia said.

After that, the team replicated their result on two datasets for training AI systems in action recognition. These are MIT’s Multi-Moments in Time and DeepMind‘s Kinetics.

Here are the details about the study.

Read More: New Machine Learning System can Design its own Code

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Sumbo Bello know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.

Profile Image

Sumbo Bello

Sumbo Bello is a creative writer who enjoys creating data-driven content for news sites. In his spare time, he plays basketball and listens to Coldplay.

Comments (0)
Most Recent most recent
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.