DeepMind's AI Mathletes Flunk High School Math Exam

DeepMind, Google‘s premier artificial intelligence division, hit a snag in its advanced AI research after its neural models flunked a high school math test. According to the AI firm, despite the cutting-edge approach they used to teach their AI mathletes mathematics, the algorithms struggled with calculating problems regular teenagers would usually ace.

The neural models only succeeded in getting a score equivalent to E in the British grading system. As per DeepMind’s paper published online in arxiv.org, solving even the most straightforward math problems involve a great deal of brainpower since people have to memorize the order in which operations should be performed and convert word problems into equations.

As it turned out, even the most advanced artificial intelligence technology today is only built to run through data, look for patterns, and analyze them. Artificially intelligent machines lack the cognitive skills people use in solving math questions that require substitutions.

DeepMind’s AI Mathlete

For their study, DeepMind researchers reportedly taught their AI mathletes using a dataset consisting of different kinds of math problems. They synthesized the dataset to produce more training examples, control the difficulty level of the exams, and reduce the AI’s training time.

The data used by DeepMind was based on the U.K national school mathematics curriculum for teenagers that covers areas of Arithmetic, Calculus, Algebra, Measurement, Numbers, and Probability.

All neural networks were given a 40-item math test. Most of the algorithms had difficulty understanding the questions, mainly when they tried to translate and analyze functions, symbols, numbers, and words. The AI models were only able to solve around 35 percent of the test or 14 out of 40 questions.

DeepMind researchers reported that they do not have the explanation yet as to why the AI models flunked the test. However, they believe that the algorithms’ performance was due to the process they employed to calculate the values in each question.

The researcher’s hope that despite the failure of the program, more AI scientists will be inspired to develop more neural architectures for AI.