AITopics

This article provides an overview of BioASQ, a new competition on biomedical semantic indexing and question answering (QA). BioASQ aims to push towards systems that will allow biomedical workers to express their information needs in natural language and that will return concise and user-understandable answers by combining information from multiple sources of different kinds, including biomedical articles, databases, and ontologies. BioASQ encourages participants to adopt semantic indexing as a means to combine multiple information sources and to facilitate the matching of questions to answers. It also adopts a broad semantic indexing and QA architecture that subsumes current relevant approaches, even though no current system instantiates all of its components. Hence, the architecture can also be seen as our view of how relevant work from fields such as information retrieval, hierarchical classification, question answering, ontologies, and linked data can be combined, extended, and applied to biomedical question answering. BioASQ will develop publicly available benchmarks and it will adopt and possibly refine existing evaluation measures. The evaluation infrastructure of the competition will remain publicly available beyond the end of BioASQ.

information, natural language, question answering, (18 more...)

Country:

Europe > Greece > Attica > Athens (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Germany > Saxony > Dresden (0.04)
(3 more...)

Genre: Overview (0.55)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Sarioglu, Efsun (The George Washington University) | Yadav, Kabir (The George Washington University) | Choi, Hyeong-Ah (The George Washington University)

Efficient Classification of Clinical Reports Utilizing Natural Language Processing

The recent emphasis on health information technology has highlighted the importance of leveraging the large amount of electronic clinical data to help guide medical decision-making. Developing such clinical decision aids requires manual review of many past patient reports in order to generate a good predictive model. In this research, we investigate classification of clinical reports using natural language processing (NLP). The proposed system uses NLP to generate structured output from computed tomography (CT) reports and then machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Our results show that NLP improves upon raw text classification results.

artificial intelligence, classification, machine learning, (16 more...)

Country:

North America > United States > District of Columbia > Washington (0.06)
North America > United States > Pennsylvania (0.05)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)
Health & Medicine > Diagnostic Medicine > Imaging (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics

Lamb, Alex (Johns Hopkins University) | Paul, Michael J. (Johns Hopkins University) | Dredze, Mark (Johns Hopkins University)

Recent studies have shown an ability to track influenza rates from Twitter since Twitter users tweet illnesses (“i am home sick with the flu”). However, users may also tweet concerned awareness of illness (“don’t want to get sick, need a flu shot”). Identifying these messages can support computational epidemic response models. We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.

artificial intelligence, machine learning, tweet, (18 more...)

Country: North America > United States > Maryland > Baltimore (0.05)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Global and Local Approach of Part-of-Speech Tagging for Large Corpora

Yu, Shi (University of Chicago) | Grossman, Robert (University of Chicago) | Rzhetsky, Andrey (University of Chicago)

We present Global-Local POS tagging, a framework to train generative stochastic Part-of-Speech models on large corpora. Global Taggers offer several advantages over their counter parts trained on small, curated corpus, including the ability to automatically extend and update their models to new text. Global Taggers also avoid a fundamental limitation of current models, whose performance heavily relies on curated text with manually assigned labels. We illustrate our approach by training several Global Taggers, implemented with generative stochastic models, on two large corpora using high performance computing architecture. We further demonstrate that global taggers can be improved by incorporating models trained on curated text, called Local Taggers, for better tagging performance derived from specific topics.

data mining, machine learning, tagger, (19 more...)

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Davis, Anthony R. (3M Health Information Systems) | Nossal, Michael (3M Health Information Systems) | Ober, N. Stephen (3M Health Information Systems)

Towards Effective Representation of Clinical Documents for Search and Retrieval

Recent studies have demonstrated the advantages of structured search of PubMed abstracts when compared with unstructured key word search. We explore whether search on clinical text is similarly enhanced by representing domain specific structures, information, and knowledge. Examples include representations of document structure and sections, local context such as negation, and appropriate modeling of scalar quantities. We examine tasks ranging from recruitment of suitable patients for studies, to chronic disease prevention and management, to longitudinal studies of individual patients or groups, as well as comparative experiments performed on an NLP enhanced clinical search tool that operates on large corpora of clinical text.

artificial intelligence, effective representation, natural language, (16 more...)

Country: North America > United States > Maryland > Montgomery County > Bethesda (0.05)

Genre: Research Report > New Finding (0.96)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (0.70)
Health & Medicine > Health Care Technology > Medical Record (0.30)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

PROBE: Periodic Random Orbiter Algorithm for Machine Learning

Smith, Larry (National Institutes of Health) | Kim, Won (National Institutes of Health) | Wilbur, W. John

We present a new algorithm, which we call PROBE, to find the minimum of a convex function. Such a minimization is important in many machine learning methods, including Support Vector Machines (SVM). We show that PROBE is a viable alternative to published algorithms for SVM learning with several important advantages. PROBE is a simple and easily programmed algorithm, with a well-defined, parametrized stopping criterion; it is not limited to SVM, but can be applied to other convex loss functions, such as the Huber and Maximum Entropy models; and its time and memory requirements are consistently modest in handling very large training sets.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States > Maryland > Montgomery County > Bethesda (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

OCR-Based Image Features for Biomedical Image and Article Classification: Identifying Documents Relevant to Genomic Cis-Regulatory Elements

Images form a significant, yet under-utilized, information source in published biomedical articles. Much current work on biomedical image retrieval and classification uses simple, standard image representation employing features such as edge direction or gray scale histograms. In our earlier work we have used such features as well to classify images, where image-class-tags have been used to represent and classify complete articles. Here we focus on a different literature classification task: identifying articles discussing cis-regulatory elements and modules, motivated by the need to understand complex gene-networks. Curators attempting to identify such articles use as a major cue a certain type of image in which the conserved cis-regulatory region on the DNA is shown. Our experiments show that automatically identifying such images using common image features (such as gray scale) is highly error prone. However, using Optical Character Recognition (OCR) to extract alphabet characters from images, calculating character distribution and using the distribution parameters as image features, forms a novel image representation, which allows us to identify DNA-content in images with high precision and recall (over 0.9). Utilizing the occurrence of DNA-rich images within articles, we train a classifier to identify articles pertaining to cis-regulatory elements with a similarly high precision and recall. Using OCR-based image features has much potential beyond the current task, to identify other types of biomedical sequence-based images showing DNA, RNA and proteins. Moreover, automatically identifying such images is applicable beyond the current use-case, in other important biomedical document classification tasks.

artificial intelligence, machine learning, representation, (17 more...)

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions

Paul, Michael J. (Johns Hopkins University) | Dredze, Mark (Johns Hopkins University)

Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of multi-dimensional latent text models, such as factorial LDA, that capture orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interests to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.

machine learning, natural language, tuple, (19 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Maryland > Baltimore (0.04)
North America > Puerto Rico (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.97)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.96)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.51)

Subgraph Matching-Based Literature Mining for Biomedical Relations and Events

Liu, Haibin (University of Colorado School of Medicine) | Keselj, Vlado (Dalhousie University) | Blouin, Christian (Dalhousie University) | Verspoor, Karin (National ICT Australia)

Extracting important relations between biological components and semantic events involving genes or proteins from literature has become a focus for the biomedical text mining community. In this paper, we review a subgraph matching-based approach proposed in our previous work for mining relations and events in the biomedical literature. Our subgraph matching algorithm is formally presented, along with a detailed analysis of its complexity. We present three different relation/event extraction tasks in which our approach has been successfully applied. Our approach is of considerable value in extracting highly precise, binary relations when appropriate training data is available.

extraction, machine learning, natural language, (18 more...)

Country:

Oceania > Australia (0.04)
North America > United States > Colorado > Adams County > Aurora (0.04)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)

Genre: Overview (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Yepes, Antonio Jimeno (National Library of Medicine) | Aronson, Alan R. (National Library of Medicine)

Integration of UMLS and MEDLINE in Unsupervised Word Sense Disambiguation

Scarcity of training data for word sense disambiguation argues for the use of knowledge-based disambiguation methods, which rely on information available in terminological resources. Unfortunately, these resources are not generally optimized to perform word sense disambiguation. On the other hand, there are many examples of ambiguous biomedical words with context in MEDLINE. However, these examples of ambiguity are not labeled with their proper sense. We propose the integration of the UMLS and MEDLINE to create concept profiles which are used to perform knowledge-based word sense disambiguation. Our results show an accuracy of 0.8770 on a biomedical word sense disambiguation data set; this represents a statistically significant improvement over other knowledge-based methods based on the UMLS on this data set.

ambiguous word, machine learning, natural language, (18 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)