AITopics

2503.18182

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)

arXiv.org Artificial IntelligenceJul-25-2024

Constructing the CORD-19 Vaccine Dataset

Singh, Manisha, Sharma, Divy, Ma, Alonso, Tyree, Bridget, Mitchell, Margaret

We introduce new dataset 'CORD-19-Vaccination' to cater to scientists specifically looking into COVID-19 vaccine-related research. This dataset is extracted from CORD-19 dataset [Wang et al., 2020] and augmented with new columns for language detail, author demography, keywords, and topic per paper. Facebook's fastText model is used to identify languages [Joulin et al., 2016]. To establish author demography (author affiliation, lab/institution location, and lab/institution country columns) we processed the JSON file for each paper and then further enhanced using Google's search API to determine country values. 'Yake' was used to extract keywords from the title, abstract, and body of each paper and the LDA (Latent Dirichlet Allocation) algorithm was used to add topic information [Campos et al., 2020, 2018a,b]. To evaluate the dataset, we demonstrate a question-answering task like the one used in the CORD-19 Kaggle challenge [Goldbloom et al., 2022]. For further evaluation, sequential sentence classification was performed on each paper's abstract using the model from Dernoncourt et al. [2016]. We partially hand annotated the training dataset and used a pre-trained BERT-PubMed layer. 'CORD- 19-Vaccination' contains 30k research papers and can be immensely valuable for NLP research such as text mining, information extraction, and question answering, specific to the domain of COVID-19 vaccine research.

cord-19 dataset, cord-19-vaccination, dataset, (12 more...)

2407.18471

Country:

South America > Brazil (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland (0.04)
(6 more...)

Genre: Research Report (0.84)

Industry: Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Bose, Arusarka, Zhou, Zili, Xu, Guandong

COV19IR : COVID-19 Domain Literature Information Retrieval

arXiv.org Artificial IntelligenceNov-8-2022

Increasing number of COVID-19 research literatures cause new challenges in effective literature screening and COVID-19 domain knowledge aware Information Retrieval. To tackle the challenges, we demonstrate two tasks along withsolutions, COVID-19 literature retrieval, and question answering. COVID-19 literature retrieval task screens matching COVID-19 literature documents for textual user query, and COVID-19 question answering task predicts proper text fragments from text corpus as the answer of specific COVID-19 related questions. Based on transformer neural network, we provided solutions to implement the tasks on CORD-19 dataset, we display some examples to show the effectiveness of our proposed solutions.

information retrieval, machine learning, question answering, (17 more...)

2211.04013

Country:

Asia > Middle East > Saudi Arabia (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.86)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

arXiv.org Artificial IntelligenceNov-29-2021

Testing the Generalization of Neural Language Models for COVID-19 Misinformation Detection

Wahle, Jan Philip, Ashok, Nischal, Ruas, Terry, Meuschke, Norman, Ghosal, Tirthankar, Gipp, Bela

A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods predominantly target specific content types (e.g., news) or platforms (e.g., Twitter). The methods' capabilities to generalize were largely unclear so far. We evaluate fifteen Transformer-based models on five COVID-19 misinformation datasets that include social media posts, news articles, and scientific papers to fill this gap. We show tokenizers and models tailored to COVID-19 data do not provide a significant advantage over general-purpose ones. Our study provides a realistic assessment of models for detecting COVID-19 misinformation. We expect that evaluating a broad spectrum of datasets and models will benefit future research in developing misinformation detection systems.

covid-19, dataset, misinformation, (13 more...)

2111.07819

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

#artificialintelligenceOct-7-2021, 13:40:38 GMT

Gaining a sense of control over the COVID-19 pandemic

How one Kaggler took top marks across multiple Covid-related challenges. Today we interview Daniel, whose notebooks earned him top marks in Kaggle's CORD-19 challenges. Kaggle hosted multiple challenges that worked with the Kaggle CORD-19 dataset, and Daniel won 1st place three times, including by a huge margin in the TREC-COVID challenge. My research interests include probabilistic forecasting, causal inference and machine learning. As part of the Kaggle CORD-19 challenge I developed discovid.ai I'm also a student assistant where I've worked on several data science projects for the last 3 years and had the opportunity to work with real world data from different companies in highly diverse domains -- from predicting the waste in a sawmill to analyzing flaws in the process of surface galvanization and testing the efficiency of a marketing campaign.

search engine, student assistant, topic model, (15 more...)

Country:

Europe > Italy (0.05)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)

Industry:

Education > Curriculum > Subject-Specific Education (0.48)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.42)
Health & Medicine > Therapeutic Area > Immunology (0.42)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.77)

#artificialintelligenceJun-14-2021, 12:25:31 GMT

When his hobbies went on hiatus, this Kaggler made fighting COVID-19 with data his mission

Most medical articles have methods & results sections and matches in those sections are more important. I had little to no expectations entering this competition, so I wouldn't say I was surprised by anything. It was great to see so many smart and capable people all working together to try to help in whatever way they could. All of the work is driven by the Kaggle platform. The list of notebooks cover all the submissions for Round 1 and Round 2 of the CORD-19 challenge. All of the notebooks are in Python.

cord-19 dataset, covid-19, custom bert qa model, (13 more...)

Genre: Personal (0.35)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Information Management (0.97)

#artificialintelligenceOct-27-2020, 22:45:45 GMT

How Elsevier Accelerated COVID-19 research using Dask on Saturn Cloud -- Elsevier Labs

The version of CORD-19 that we used yielded 3,389,064 paragraphs and 16,952,279 sentences. Each sentence is sent to each model and yields zero or more entities. A notable point is that the process of generating entities from sentences is embarrassingly parallel, and therefore processing multiple sentences in parallel achieves savings in processing time. . To process the dataset, we used Dask, an open source library for parallel computing in Python. Dask provides multiple convenient abstractions that mimic familiar APIs such as Numpy and Pandas Dataframes, which can operate on datasets that do not fit in main memory.

artificial intelligence, cord-19 dataset, natural language, (14 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.32)

#artificialintelligenceJul-14-2020, 20:00:55 GMT

Bringing IBM NLP capabilities to the CORD-19 Dataset

To assist in the fight against the COVID-19 pandemic, prominent research institutes led by Allen Institute for AI (AI2) released earlier this year the COVID-19 Open Research Dataset (CORD-19). Comprised of scientific articles related to COVID-19, Sars-Cov-2, and related coronaviruses, the dataset (which at the time of writing this contains more than 75,000 full text scientific papers) is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease (1,2). While a tremendous resource, the dataset initially did not include information found in tables due to the difficulty of extracting tabular data. However, following the launch of the Kaggle challenge associated with CORD-19, table information rose to become the most requested feature by challenge participants. Recognizing that critical scientific facts and data are often organized in a tabular format, IBM Research AI offered to apply our extensive experience in document and table conversion to update the CORD-19 dataset and, in turn, open up additional critical information to the global science and medical community in efforts to fight COVID-19.

artificial intelligence, information, natural language, (14 more...)

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Switzerland > Geneva > Geneva (0.05)
Asia > India (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceJul-3-2020

Coronavirus Knowledge Graph: A Case Study

Chen, Chongyan, Ebeid, Islam Akef, Bu, Yi, Ding, Ying

The emergence of the novel COVID-19 pandemic has had a significant impact on global healthcare and the economy over the past few months. The virus's rapid widespread has led to a proliferation in biomedical research addressing the pandemic and its related topics. One of the essential Knowledge Discovery tools that could help the biomedical research community understand and eventually find a cure for COVID-19 are Knowledge Graphs. The CORD-19 dataset is a collection of publicly available full-text research articles that have been recently published on COVID-19 and coronavirus topics. Here, we use several Machine Learning, Deep Learning, and Knowledge Graph construction and mining techniques to formalize and extract insights from the PubMed dataset and the CORD-19 dataset to identify COVID-19 related experts and bio-entities. Besides, we suggest possible techniques to predict related diseases, drug candidates, gene, gene mutations, and related compounds as part of a systematic effort to apply Knowledge Discovery methods to help biomedical researchers tackle the pandemic.

data mining, machine learning, natural language, (19 more...)

2007.10287

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

#artificialintelligenceMay-2-2020, 06:43:32 GMT

COVID-19 Datasets Bring AI Experts, Life Sciences Researchers Together For A Cure - AI Trends

All of the Bio-IT community is eager to contribute to plans for treatments, diagnostics and vaccines for SARS-CoV-2 and the resulting disease, COVID-19. Companies are donating consulting services, compute resources, tools for clinical trials, and so much more. But the biggest donations might be the sheer volume of data being pooled for researchers to mine for answers. On March 16, the Allen Institute for AI (AI2), Chan Zuckerberg Initiative (CZI), Georgetown University's Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) released the COVID-19 Open Research Dataset (CORD-19). The dataset, accessible through the Allen Institute for AI's Semantic Scholar platform, includes scholarly literature about COVID-19, SARS-CoV-2, and the coronavirus group.

covid-19, dataset, semantic scholar, (12 more...)

Genre: Research Report > Experimental Study (0.51)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.49)