Data, or the lack of it, required for model training, is a significant challenge in the field of natural language processing (NLP).
To get the hang of a specific task, deep learning-based NLP models need tons of human-annotated training examples.
Researchers have been looking for ways to overcome this challenge and make use of the mountains of unlabeled data available on the web. They call this pre-training.
Models are pre-trained on language dumps (like Wikipedia), then fine-tuned for a specific NLP task using labeled data. Then, here comes Google’s NLP model BERT and its newest lite version, ALBERT.
Read More: 3 Amazing Natural Language Processing Applications
NLP model already understands the language and doesn’t have to go from scratch to perform a task at hand. Compared to traditional models, language models can fine-tune itself, unsupervised, to perform better.
In October 2018, Google open-sourced a new method for NLP model pre-training called BERT, short for Bidirectional Encoder Representations from Transformers.
BERT, said Google, allows anyone in the world to train their state-of-the-art NLP model for a variety of tasks in a few hours using a single GPU. Using a cloud Tensor Processing Unit (TPU), like Google’s, training takes even less, about 30 minutes.
Meet Google’s new NLP Model – ALBERT
Now, Google has launched a lite version of BERT, called ALBERT, introduced as A Lite BERT for Self-Supervised Learning of Language Representations.
With this upgrade to BERT, Google’s new deep learning-based NLP model achieved SOTA (state-of-the-art) performance on 12 popular NLP tasks, like question-answering and reading comprehension.
In a paper, the BERT team presented “two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT.”
As the name suggests, ALBERT is a leaner version of BERT. Basically, it’s the same language representation model, with about the same accuracy, but much faster and with 89 percent fewer parameters.
Thanks to the two optimization techniques, ALBERT comes with only 12M parameters, while BERT has 108M. Compared to BERT’s 82.3% average, ALBERT achieves an average of 80.1% accuracy on several NLP benchmarks.
The team trained ALBERT-xxlarge, or double-extra-large, model which gained an overall 30% parameter reduction and performed significantly better on benchmarks compared to the BERT-large model.
“The success of ALBERT, said Google, demonstrates the importance of identifying the aspects of a model that give rise to powerful contextual representations. By focusing improvement efforts on these aspects of the model architecture, it is possible to greatly improve both the model efficiency and performance on a wide range of NLP tasks.”
Besides the English language-based version of ALBERT, Google has also released Chinese-language ALBERT models.
To power further NLP research, Google has made ALBERT open-source, and both ABERT’s code and models are available on this GitHub page.
Comments (0)
Most Recent