AITopics

In the context of two related NIH projects supporting scientific collaboration we seek to implement an environment for collaborative information retrieval and analysis based on utility theory.

artificial intelligence, information retrieval, natural language, (14 more...)

Country:

North America > United States > New York (0.05)
North America > United States > New Hampshire (0.05)
Asia > Middle East > Lebanon (0.05)
(4 more...)

Industry: Health & Medicine (0.51)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

OCR-Based Image Features for Biomedical Image and Article Classification: Identifying Documents Relevant to Genomic Cis-Regulatory Elements

Images form a significant, yet under-utilized, information source in published biomedical articles. Much current work on biomedical image retrieval and classification uses simple, standard image representation employing features such as edge direction or gray scale histograms. In our earlier work we have used such features as well to classify images, where image-class-tags have been used to represent and classify complete articles. Here we focus on a different literature classification task: identifying articles discussing cis-regulatory elements and modules, motivated by the need to understand complex gene-networks. Curators attempting to identify such articles use as a major cue a certain type of image in which the conserved cis-regulatory region on the DNA is shown. Our experiments show that automatically identifying such images using common image features (such as gray scale) is highly error prone. However, using Optical Character Recognition (OCR) to extract alphabet characters from images, calculating character distribution and using the distribution parameters as image features, forms a novel image representation, which allows us to identify DNA-content in images with high precision and recall (over 0.9). Utilizing the occurrence of DNA-rich images within articles, we train a classifier to identify articles pertaining to cis-regulatory elements with a similarly high precision and recall. Using OCR-based image features has much potential beyond the current task, to identify other types of biomedical sequence-based images showing DNA, RNA and proteins. Moreover, automatically identifying such images is applicable beyond the current use-case, in other important biomedical document classification tasks.

artificial intelligence, machine learning, representation, (17 more...)

Country:

North America > United States > Delaware > New Castle County > Newark (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Notes about the OntoGene Pipeline

Rinaldi, Fabio (University of Zurich) | Clematide, Simon (University of Zurich) | Schneider, Gerold (University of Zurich) | Grigonyte, Gintare (University of Zurich)

In this paper we describe the architecture of the OntoGene Relation mining pipeline and some of its recent applications. With this research overview paper we intend to provide a contribution towards the recently started discussion towards standards for information extraction architectures in the biomedical domain. Our approach delivers domain entities mentioned in each input document, as well as candidate relationships, both ranked according to a confidency score computed by the system. This information is presented to the user through an advanced interface aimed at supporting the process of interactive curation.

data mining, information retrieval, machine learning, (21 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
(3 more...)

Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions

Paul, Michael J. (Johns Hopkins University) | Dredze, Mark (Johns Hopkins University)

Clinical research of new recreational drugs and trends requires mining current information from non-traditional text sources. In this work we support such research through the use of multi-dimensional latent text models, such as factorial LDA, that capture orthogonal factors of corpora, creating structured output for researchers to better understand the contents of a corpus. Since a purely unsupervised model is unlikely to discover specific factors of interests to clinical researchers, we modify the structure of factorial LDA to incorporate prior knowledge, including the use of of observed variables, informative priors and background components. The resulting model learns factors that correspond to drug type, delivery method (smoking, injection, etc.), and aspect (chemistry, culture, effects, health, usage). We demonstrate that the improved model yields better quantitative and more interpretable results.

machine learning, natural language, tuple, (19 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Maryland > Baltimore (0.04)
North America > Puerto Rico (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.97)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.96)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.51)

Subgraph Matching-Based Literature Mining for Biomedical Relations and Events

Liu, Haibin (University of Colorado School of Medicine) | Keselj, Vlado (Dalhousie University) | Blouin, Christian (Dalhousie University) | Verspoor, Karin (National ICT Australia)

Extracting important relations between biological components and semantic events involving genes or proteins from literature has become a focus for the biomedical text mining community. In this paper, we review a subgraph matching-based approach proposed in our previous work for mining relations and events in the biomedical literature. Our subgraph matching algorithm is formally presented, along with a detailed analysis of its complexity. We present three different relation/event extraction tasks in which our approach has been successfully applied. Our approach is of considerable value in extracting highly precise, binary relations when appropriate training data is available.

extraction, machine learning, natural language, (18 more...)

Country:

Oceania > Australia (0.04)
North America > United States > Colorado > Adams County > Aurora (0.04)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)

Genre: Overview (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Yepes, Antonio Jimeno (National Library of Medicine) | Aronson, Alan R. (National Library of Medicine)

Integration of UMLS and MEDLINE in Unsupervised Word Sense Disambiguation

Scarcity of training data for word sense disambiguation argues for the use of knowledge-based disambiguation methods, which rely on information available in terminological resources. Unfortunately, these resources are not generally optimized to perform word sense disambiguation. On the other hand, there are many examples of ambiguous biomedical words with context in MEDLINE. However, these examples of ambiguity are not labeled with their proper sense. We propose the integration of the UMLS and MEDLINE to create concept profiles which are used to perform knowledge-based word sense disambiguation. Our results show an accuracy of 0.8770 on a biomedical word sense disambiguation data set; this represents a statistically significant improvement over other knowledge-based methods based on the UMLS on this data set.

ambiguous word, machine learning, natural language, (18 more...)

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Automatic Formalization of Clinical Practice Guidelines

Gerber, Matthew (University of Virginia) | Brown, Donald (University of Virginia) | Harrison, James (University of Virginia)

Current efforts aim to incorporate knowledge from clinical practice guidelines (CPGs) into computer systems using sophisticated interchange formats. Due to their complexity, such formats require expensive manual formalization work. This paper presents a preliminary study of using natural language processing (NLP) to automatically formalize CPG recommendations. We developed a CPG representation using concepts from the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED–CT), and manually applied this representation to a sample of CPG recommendations that is representative of multiple medical domains and recommendation types. Using this resource, we trained and evaluated a supervised classification model that formalizes new CPG recommendations according to the SNOMED–CT representation, achieving a precision of 75% and recall of 42% (F1 = 54%). We have identified two important lines of future investigation: (1) feature engineering to address the unique linguistic properties of CPG recommendations, and (2) alternative model formulations that are more robust to processing errors. A third line of investigation – creating additional training data for the NLP model – is shown to be of little utility.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(3 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Dogan, Rezarta Islamaj (National Center for Biotechnology Information) | Lu, Zhiyong (National Center for Biotechnology Information)

An Inference Method for Disease Name Normalization

PubMed ® and other literature databases contain a wealth of information on diseases and their diagnosis/treatment in the form of scientific publications. In order to take advantage of such rich information, several text-mining tools have been developed for automatically detecting mentions of disease names in the PubMed abstracts. The next important step is the normalization of the various disease names to standardized vocabulary entries and medical dictionaries. To this end, we present an automatic approach for mapping disease names in PubMed abstracts to their corresponding concepts in Medical Subject Headings (MeSH ® ) or Online Mendelian Inheritance in Man (OMIM ® ). For developing our algorithm, we merged disease concept annotations from two existing corpora. In addition, we hand annotated a separate test set of decease concepts for our method evaluation. Different from others, we reformulate the disease name normalization task as an information retrieval task where input queries are disease names and search results are disease concepts. As such, our inference method builds on existing Lucene search and further improves it by taking into account the string similarity of query terms to the disease concept name and synonyms. Evaluation results show that our method compares favorably to other state-of-the-art approaches. In conclusion, we find that our approach is a simple and effective way for linking disease names to controlled vocabularies and that the merged disease corpus provides added value for the development of text mining tools for named entity recognition from biomedical text. Data is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Dogan/disease.html

data mining, information retrieval, natural language, (21 more...)

Country: North America > United States > Maryland > Montgomery County > Bethesda (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)
Information Technology > Data Science > Data Mining > Text Mining (0.54)

Discovering Health Beliefs in Twitter

Bhattacharya, Sanmitra (The University of Iowa) | Tran, Hung (The University of Iowa) | Srinivasan, Padmini (The University of Iowa)

Social networking websites such as Twitter have invigorated a wide range of studies in recent years ranging from consumer opinions on products to tracking the spread of diseases. While sentiment analysis and opinion mining from tweets have been studied extensively, surveillance of beliefs, especially those related to public health, have received considerably less attention. In our previous work, we proposed a model for surveillance of health beliefs on Twitter relying on the use of hand-picked probe statements expressing various health-related propositions. In this work we extend our model to automatically discover various probes related to public health beliefs. We present a data driven approach based on two distinct datasets and study the prevalence of public belief, disbelief or doubt for newly discovered probe statements.

artificial intelligence, natural language, social media, (18 more...)