AITopics | cord-19

Collaborating Authors

cord-19

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM-based feature generation from text for interpretable machine learning

Balek, Vojtěch, Sýkora, Lukáš, Sklenák, Vilém, Kliegr, Tomáš

arXiv.org Artificial IntelligenceSep-11-2024

Existing text representations such as embeddings and bag-of-words are not suitable for rule learning due to their high dimensionality and absent or questionable feature-level interpretability. This article explores whether large language models (LLMs) could address this by extracting a small number of interpretable features from text. We demonstrate this process on two datasets (CORD-19 and M17+) containing several thousand scientific articles from multiple disciplines and a target being a proxy for research impact. An evaluation based on testing for the statistically significant correlation with research impact has shown that LLama 2-generated features are semantically meaningful. We consequently used these generated features in text classification to predict the binary target variable representing the citation rate for the CORD-19 dataset and the ordinal 5-class target representing an expert-awarded grade in the M17+ dataset. Machine-learning models trained on the LLM-generated features provided similar predictive performance to the state-of-the-art embedding model SciBERT for scientific text. The LLM used only 62 features compared to 768 features in SciBERT embeddings, and these features were directly interpretable, corresponding to notions such as article methodological rigor, novelty, or grammatical correctness. As the final step, we extract a small number of well-interpretable action rules. Consistently competitive results obtained with the same LLM feature set across both thematically diverse datasets show that this approach generalizes across domains.

action rule, dataset, llm-generated feature, (17 more...)

arXiv.org Artificial Intelligence

2409.07132

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Czechia > Prague (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

An Information Retrieval and Extraction Tool for Covid-19 Related Papers

Pivetta, Marcos V. L.

arXiv.org Artificial IntelligenceJan-19-2024

Background: The COVID-19 pandemic has caused severe impacts on health systems worldwide. Its critical nature and the increased interest of individuals and organizations to develop countermeasures to the problem has led to a surge of new studies in scientific journals. Objetive: We sought to develop a tool that incorporates, in a novel way, aspects of Information Retrieval (IR) and Extraction (IE) applied to the COVID-19 Open Research Dataset (CORD-19). The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers, helping them find reference papers and hightlight relevant entities in text. Method: We applied Latent Dirichlet Allocation (LDA) to model, based on research aspects, the topics of all English abstracts in CORD-19. Relevant named entities of each abstract were extracted and linked to the corresponding UMLS concept. Regular expressions and the K-Nearest Neighbors algorithm were used to rank relevant papers. Results: Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers. Nonetheless, we identified that more fine-tuned topic modeling parameters and increased accuracy of the research aspect classifier model could lead to a more accurate and reliable tool. Conclusion: We emphasize the need of new automated tools to help researchers find relevant COVID-19 documents, in addition to automatically extracting useful information contained in them. Our work suggests that combining different algorithms and models could lead to new ways of browsing COVID-19 paper data.

anuary 31, cord-19, research aspect, (15 more...)

arXiv.org Artificial Intelligence

2401.1643

Country:

South America > Brazil > Rio Grande do Sul (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)

Add feedback

Bringing IBM NLP capabilities to the CORD-19 Dataset

#artificialintelligenceJul-14-2020, 20:00:55 GMT

To assist in the fight against the COVID-19 pandemic, prominent research institutes led by Allen Institute for AI (AI2) released earlier this year the COVID-19 Open Research Dataset (CORD-19). Comprised of scientific articles related to COVID-19, Sars-Cov-2, and related coronaviruses, the dataset (which at the time of writing this contains more than 75,000 full text scientific papers) is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease (1,2). While a tremendous resource, the dataset initially did not include information found in tables due to the difficulty of extracting tabular data. However, following the launch of the Kaggle challenge associated with CORD-19, table information rose to become the most requested feature by challenge participants. Recognizing that critical scientific facts and data are often organized in a tabular format, IBM Research AI offered to apply our extensive experience in document and table conversion to update the CORD-19 dataset and, in turn, open up additional critical information to the global science and medical community in efforts to fight COVID-19.

artificial intelligence, information, natural language, (14 more...)

#artificialintelligence

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Switzerland > Geneva > Geneva (0.05)
Asia > India (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Answering Questions on COVID-19 in Real-Time

Lee, Jinhyuk, Yi, Sean S., Jeong, Minbyul, Sung, Mujeen, Yoon, Wonjin, Choi, Yonghwa, Ko, Miyoung, Kang, Jaewoo

arXiv.org Artificial IntelligenceJun-29-2020

The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it. One reason why the fight is difficult is due to the lack of information and knowledge. In this work, we outline our effort to contribute to shrinking this knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time. Our system leverages both supervised and unsupervised approaches to provide informative answers using DenSPI (Seo et al., 2019) and BEST (Lee et al., 2016). Evaluation of covidAsk is carried out by using a manually created dataset called COVID-19 Questions which is based on facts about COVID-19. We hope our system will be able to aid researchers in their search for knowledge and information not only for COVID-19 but for future pandemics as well.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2006.1583

Country: Europe > Austria > Vienna (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Information Management (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19

Oniani, David, Wang, Yanshan

arXiv.org Artificial IntelligenceJun-23-2020

COVID-19 has resulted in an ongoing pandemic and as of 12 June 2020, has caused more than 7.4 million cases and over 418,000 deaths. The highly dynamic and rapidly evolving situation with COVID-19 has made it difficult to access accurate, on-demand information regarding the disease. Online communities, forums, and social media provide potential venues to search for relevant questions and answers, or post questions and seek answers from other members. However, due to the nature of such sites, there are always a limited number of relevant questions and responses to search from, and posted questions are rarely answered immediately. With the advancements in the field of natural language processing, particularly in the domain of language models, it has become possible to design chatbots that can automatically answer consumer questions. However, such models are rarely applied and evaluated in the healthcare domain, to meet the information needs with accurate and up-to-date healthcare data. In this paper, we propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses. We utilized the GPT-2 language model and applied transfer learning to retrain it on the COVID-19 Open Research Dataset (CORD-19) corpus. In order to improve the quality of the generated responses, we applied 4 different approaches, namely tf-idf, BERT, BioBERT, and USE to filter and retain relevant sentences in the responses. In the performance evaluation step, we asked two medical experts to rate the responses. We found that BERT and BioBERT, on average, outperform both tf-idf and USE in relevance-based sentence filtering tasks. Additionally, based on the chatbot, we created a user-friendly interactive web application to be hosted online.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2006.10964

Country:

North America > United States > Minnesota > Olmsted County > Rochester (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

New tools aim to tame pandemic paper tsunami

ScienceMay-29-2020, 06:17:20 GMT

Science's COVID-19 coverage is supported by the Pulitzer Center. Timothy Sheahan, a virologist studying COVID-19, wishes he could keep pace with the growing torrent of new scientific papers related to the pandemic. But there have just been too many--more than 5000 papers a week. "I'm not keeping up," says Sheahan, who works at the University of North Carolina, Chapel Hill. A loose-knit army of data scientists and software developers is pressing hard to change that.

artificial intelligence, machine learning, natural language, (15 more...)

Science

Country:

North America > United States > North Carolina > Orange County > Chapel Hill (0.25)
North America > United States > Washington > King County > Seattle (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

Tang, Raphael, Nogueira, Rodrigo, Zhang, Edwin, Gupta, Nikhil, Cam, Phuong, Cho, Kyunghyun, Lin, Jimmy

arXiv.org Artificial IntelligenceApr-23-2020

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge. To our knowledge, this is the first publicly available resource of its type, and intended as a stopgap measure for guiding research until more substantial evaluation resources become available. While this dataset, comprising 124 question-article pairs as of the present version 0.1 release, does not have sufficient examples for supervised machine learning, we believe that it can be helpful for evaluating the zero-shot or transfer capabilities of existing models on topics specifically related to COVID-19. This paper describes our methodology for constructing the dataset and presents the effectiveness of a number of baselines, including term-based techniques and various transformer-based models. The dataset is available at http://covidqa.ai/

dataset, effectiveness, natural language question, (14 more...)

arXiv.org Artificial Intelligence

2004.11339

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(8 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

Fighting the Covid-19: All the datasets and data efforts in one place

#artificialintelligenceApr-7-2020, 00:52:53 GMT

Since the corona erupted into our world, research institutes and governments have released many databases publicly to allow research groups (and independent individuals) to analyze the data around the corona's spread. These databases are scattered under numerous initiatives and sources. The purpose of this blog is to organize all the major open databases and data initiatives around the world. Feel free to add it in the comments or through this form. In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19).

covid-19, outbreak, repository, (10 more...)

#artificialintelligence

Country:

North America > United States > Washington (0.05)
Asia > China > Hubei Province > Wuhan (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (0.69)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

CORD-19: Database of scientific articles launched to help AI fight COVID-19

#artificialintelligenceMar-23-2020, 11:06:38 GMT

Earlier this week, five organisations released an open dataset – CORD-19 – containing nearly 30 000 scientific articles with the hopes that artificial intelligence will be able to use the data and combat the spread of COVID-19 infections. These articles have previously been published in journals, or were saved on pre-print servers. CORD-19 is short for COVID-19 Open Research Data set. The CORD-19 dataset was released after the Trump administration issued a "call to action" for the tech community to develop AI (artificial intelligence) techniques to curb the spread of COVID-19 infections. In addition, Michael Kratsios, US Chief Technology Officer at The White House, explained that "decisive action from America's science and technology enterprise" was needed to prevent, detect, treat and develop a cure for COVID-19.

covid-19, help ai fight covid-19, scientific article, (9 more...)

#artificialintelligence

Country: