Marketing 3 min read

What You Should Know About Google's new SMITH Algorithm

mohamed_hassan /

mohamed_hassan /

Google’s new SMITH algorithm is similar to BERT in many ways, just better.

Several natural language processing and information retrieval problems involve semantic matching. It’s a technique that’s used to identify semantically related information.

For example, such a model could detect that a document labeled “car” is equivalent to another labeled “automobile.”

Sounds simple enough, right? However, semantic matching applications extend beyond merely identifying words.

Today, search algorithm components such as BERT and Transformer rely on this technique to understand the nuances and contexts of words. It’s why Google uses the BERT to organize Top Stories and Featured Snippets.

However, BERT is far from perfect.

According to Google, the NLP model focuses primarily on matching short texts such as a few sentences or paragraphs. As a result, it may struggle with long-form documents, which have several essential applications.

These include:

  • News recommendation
  • Related article recommendation
  • Document clustering

To address this problem, the search giant published a document proposing a new model for long-form content matching. It’s called the Siamese Multi-depth Transformer-based Hierarchical Encoder — or SMITH for short.

The Google document reads:

“In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching.”

So, how does SMITH compare to BERT?

Google’s SMITH Algorithm vs. BERT: A Basic Comparison

As said earlier, BERT is trained to understand words within the context of sentences. On the other hand, SMITH can capture sentence-level semantic relations within a document.

In other words, Google trained the new model to match passages within the context of the entire content. But how?

First, the search giant trained SMITH with a masked word language modeling task used by BERT. That way, it could predict random words within the context of sentences.

However, pre-training the model with a novel masked sentence-block language modeling task made all the difference. With that, SMITH was able to identify the next block of text in a long-form document.

In several benchmark tests for long-form document matching, Google noted that the new model outperforms previous ones, including BERT.

The document reads:

“Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048.”

Indeed, the idea of SMITH outperforming previous state-of-the-art models such as BERT is intriguing. However, it’s unlikely that the new model would replace the old one.

Instead, Google could use SMITH alongside BERT to understand both long and short queries and documents.

Read the original research paper here.

Read More: Google Confirms Testing Practice Problems in Search Results

First AI Web Content Optimization Platform Just for Writers

Found this article interesting?

Let Sumbo Bello know how much you appreciate this article by clicking the heart icon and by sharing this article on social media.

Profile Image

Sumbo Bello

Sumbo Bello is a creative writer who enjoys creating data-driven content for news sites. In his spare time, he plays basketball and listens to Coldplay.

Comments (0)
Most Recent most recent
share Scroll to top

Link Copied Successfully

Sign in

Sign in to access your personalized homepage, follow authors and topics you love, and clap for stories that matter to you.

Sign in with Google Sign in with Facebook

By using our site you agree to our privacy policy.