For a while now, computer systems could automatically generate captions for news images.
The system analyzes an image, and comes up with a simple description such as “a child is running.” Although some of these image captioning techniques perform relatively well, they are still somewhat limited.
And here’s why.
In real-life scenarios, images come with personal and often unique stories. For example, a running child’s picture may have been captured at a birthday party or during a family picnic.
Unfortunately, systems for generating captions are not as descriptive or context-aware. So, their image captions are usually dull, generic, and mostly uninteresting.
Also, images in news websites and blogs come with articles providing further information about the picture. These could include details about a specific event or the person that captured the pictures.
However, most existing systems for generating image captions don’t consider this information. Rather than acknowledge these texts, they treat the image as an isolated object.
Three researchers — Alasdair Tran, Alexander Mathews, and Lexing Xie — from the Australian National University set out to address this issue.
In a statement to the press, one of the researchers, Lexing Xie, said:
“Our lab has already done work that makes image captions sentimental and romantic, and this work is a continuation on a different dimension. In this new direction, we wanted to focus on the context.”
In a recently pre-published paper, the researchers introduced a system that generates context-aware captions for news images.
Context-Aware Caption-Generating System for News Images
Unlike a previously developed image captioning system, the new model doesn’t ignore unusual names in a text.
Instead, it breaks them down into subparts and analyzes them through a technique called byte pair encoding. As a result, the system could generate captions containing an unrestricted vocabulary.
The researchers also adopted a new type of architecture dubbed transformer. However, the critical algorithmic component that made all the difference is the attention mechanism.
Thanks to this component, the system can compute similarities between words in the caption and the image context. “This is done using functions that generalize the vector inner products,” Xie said.
Since the majority of images published in newspapers feature people, the team added modules for face and object detection to their model.
Tran noted:
“Getting a machine to think as humans have always been an important goal of artificial intelligence research. We were able to get one step closer to this goal by building a model that can incorporate real-world knowledge about names in the existing text.”
An initial evaluation of the image captioning system showed a remarkable result.
It was able to analyze long texts and identify the most salient parts. Using this analysis, it was able to generate a context-aware caption for news images.
What’s more, the captions from the model resemble the writing style of the New York Times, the principal source of its training data.
As you may have guessed, the new system will enable journalists and other media experts to create captions for news images faster and more efficiently.
For now, the researchers are working on creating a model that’ll identify the best place to insert the image within a text. This could ultimately speed up the news publishing process.
Comments (0)
Most Recent