Verizon Media has launched a new open-source, big data coronavirus search engine called Vespa.
Access to information is essential during the current COVID-19 pandemic. Not only are we interested in how the virus makes us ill, but we also want to know what to do about it.
Luckily, researchers have created over 50,000 articles to address these questions. Yes, that’s a lot of information, and it begs the question: how do we make sense of it all?
That’s where Verizon media’s Vespa comes in. Vespa is an open-source, big data processing software to create a coronavirus academic research search engine.
In a statement to the press, Verizon Media CTO, Rathi Murthy said:
“Given our experience with big data at Yahoo, we thought the best way to help was to index the data set and develop a search engine that lets researchers filter and search the 45,000 plus scholarly articles using keywords and simple search terms.”
Here’s how it works.
Using Vespa to Make Sense of Coronavirus Research
The engine works on top of the COVID-19 Open Research Dataset (CORD-19).
With this dataset, medical researchers can conveniently find and create new insights on ways to fight the virus. The researchers update the documents as they publish new papers in peer-reviewed publications and archival services.
These include bioRxiv, biological sciences preprints, as well as medRxiv, health science preprints. Other documents in the database also link to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications.
Unlike some search engines on the internet today, Vespa combines several methods to find the best answer. The Verizon search engine uses a pre-trained data mining model called scibert-nli to search texts.
Normally, Verizon uses Vespa for applications that range from article recommendations to ad targeting. However, the company has now keyword-indexed COVID-19 articles to provide easy access to information related to the disease.
The more tech-savvy researchers can still access data via the CORD-19 application programming interface (API).
They are in such great numbers