CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization

Esteva, Andre, Kale, Anuprit, Paulus, Romain, Hashimoto, Kazuma, Yin, Wenpeng, Radev, Dragomir, Socher, Richard

Jun-16-2020–arXiv.org Artificial Intelligence

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. As of May 2020, 128,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset Challenge [23]. Here we present CO-Search, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis. The retriever is built from a Siamese-BERT[18] encoder that is linearly composed with a TF-IDF vectorizer [19], and reciprocal-rank fused [5] with a BM25 vectorizer. The ranker is composed of a multi-hop question-answering module[1], that together with a multi-paragraph abstractive summarizer adjust retriever scores. To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations, creating 1.3 million (citation title, paragraph) tuples for training the encoder. We evaluate our system on the data of the TREC-COVID[22] information retrieval challenge. CO-Search obtains top performance on the datasets of the first and second rounds, across several key metrics: normalized discounted cumulative gain, precision, mean average precision, and binary preference.

immunology, paragraph, us government, (18 more...)

arXiv.org Artificial Intelligence

Jun-16-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.69)

Genre:
- Research Report (0.64)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.46)
- Health & Medicine > Therapeutic Area
  - Immunology (1.00)
  - Infections and Infectious Diseases (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found