Bing Improves Search Experience Through BERT and Azure GPUs

In a recent blog post, Microsoft announced that Bing has been using BERT along with Azure GPUs to improve the search experience.

Last month, Google updated its search algorithm to include the Bidirectional Encoder Representations from Transformers – or BERT. It’s essentially a natural language model that analyzes the user’s search intent to deliver the most relevant result to a query.

It turns out Google is not the only search engine using BERT to provide the most relevant result. According to Microsoft, Bing has been using the transformer models for almost seven months.

A program manager at Bing, Jeffrey Zhu, wrote in a blog post:

“Starting from April of this year, we used large transformer models to deliver the largest quality improvements to our Bing customers in the past year.”

Here’s how the Microsoft-owned search engine improved the search experience.

How BERT Improved Search Experience on Bing

How Bing provides relevant results to improve search experience — Image Credit: Microsoft

In the query above, the user clearly wants to learn about actions to be taken after a concussion.

In the “Before” image, the search engine got the intent wrong, providing information on the causes and symptoms. Meanwhile, the “After” image that’s powered by BERT understood the user intent and delivered a more useful result.

Microsoft says that it now applies the models to every Bing search query globally. So, regardless of the part of the world, the search engine should provide a relevant result that reflects the users’ search intent.

Optimizing the BERT Model

To deliver a fast search experience to users, Microsoft says it used the Azure Virtual Machine GPUs.

Unlike the previous deep neural network architectures, BERT relies on a massive parallel computing. Since Microsoft designed its Azure’s N-series Virtual Machine for that exact purpose, it was a perfect fit.

Using more than 2,000 Azure Virtual Machines, the tech company was able to serve over 1 million BERT inferences per second worldwide.