New Algorithm can Speed up Deep Learning Technology

Computer scientists at Rice University have overcome a significant obstacle in the Artificial Intelligence industry. They managed to speed up deep learning technology without specialized hardware like GPU.

Various companies across the world depend on deep learning technology to power multiple features.

For instance, digital assistants like Alexa and Siri use this form of artificial intelligence. Similarly, deep learning plays a significant role in product recommendation systems, facial recognition, among others.

As a result, these businesses invest heavily in GPUs and other specialized hardware to implement deep learning. Just last year, Nvidia — the makers of Tesla V100 Tensor Core GPUs — reported a 41 percent increase in its Q4 revenue compared with the previous year.

Now, researchers at Rice University have demonstrated how to implement deep learning without using GPU. And it involves an algorithm called “sub-linear deep learning engine,” or SLIDE for short.

In a statement about the project, an assistant professor in Rice’s Brown School of Engineering, Anshumali Shrivastava said:

“Our tests show that SLIDE is the first smart algorithmic implementation of deep learning on CPU that can outperform GPU hardware acceleration on industry-scale recommendation datasets with large fully connected architectures.”

Shrivastava invented SLIDE along with graduate students Beidi Chen and Tharun Medini.

Using SLIDE to Implement Deep Learning Technology

The standard back-propagation training technique for deep neural networks requires matrix multiplication. Expectedly, it’s an ideal workload for GPUs.

Meanwhile, the researchers took a fundamentally different approach with SLIDE. They turned neural network training into a search problem that could be solved with hash tables.

Hashing was invented for internet search in the 1990s.

It’s a data-indexing technique that uses numerical methods to encode large amounts of information as a string of digits called a hash. So, hash tables are a list of hashes that can you can search quickly.

It results in a significant reduction in the computational overhead for SLIDE compared with back-propagation training.

Shrivastava and his colleagues tested a workload with over 100 million parameters on a top-of-the-line GPU platform that costs $100,000.

“We trained it with the best (software) package out there, Google’s TensorFlow, and it took 3 1/2 hours to train, says Shrivastava. “We then showed that our new algorithm could do the training in one hour, not on GPUs but a 44-core Xeon-class CPU.”

At first, the researchers ran into a problem, which caused a lot of cache misses. However, collaborators from Intel solved the problem and improved the algorithms’ results by 50 percent.

“The whole message is, ‘Let’s not be bottlenecked by multiplication matrix and GPU memory,’” Chen said. “Ours may be the first algorithmic approach to beat GPU, but I hope it’s not the last.”