All About Google's new NLP Model BERT

Data, or the lack of it, required for model training, is a significant challenge in the field of natural language processing (NLP).

To get the hang of a specific task, deep learning-based NLP models need tons of human-annotated training examples.

Researchers have been looking for ways to overcome this challenge and make use of the mountains of unlabeled data available on the web. They call this pre-training.

Models are pre-trained on language dumps (like Wikipedia), then fine-tuned for a specific NLP task using labeled data. Then, here comes Google’s NLP model BERT and its newest lite version, ALBERT.

Meet Google’s new NLP Model – ALBERT

Now, Google has launched a lite version of BERT, called ALBERT, introduced as A Lite BERT for Self-Supervised Learning of Language Representations.

With this upgrade to BERT, Google’s new deep learning-based NLP model achieved SOTA (state-of-the-art) performance on 12 popular NLP tasks, like question-answering and reading comprehension.

In a paper, the BERT team presented “two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT.”

As the name suggests, ALBERT is a leaner version of BERT. Basically, it’s the same language representation model, with about the same accuracy, but much faster and with 89 percent fewer parameters.

Thanks to the two optimization techniques, ALBERT comes with only 12M parameters, while BERT has 108M. Compared to BERT’s 82.3% average, ALBERT achieves an average of 80.1% accuracy on several NLP benchmarks.

The team trained ALBERT-xxlarge, or double-extra-large, model which gained an overall 30% parameter reduction and performed significantly better on benchmarks compared to the BERT-large model.

“The success of ALBERT, said Google, demonstrates the importance of identifying the aspects of a model that give rise to powerful contextual representations. By focusing improvement efforts on these aspects of the model architecture, it is possible to greatly improve both the model efficiency and performance on a wide range of NLP tasks.”

Besides the English language-based version of ALBERT, Google has also released Chinese-language ALBERT models.

To power further NLP research, Google has made ALBERT open-source, and both ABERT’s code and models are available on this GitHub page.