An Inference Method for Disease Name Normalization
Dogan, Rezarta Islamaj (National Center for Biotechnology Information) | Lu, Zhiyong (National Center for Biotechnology Information)
PubMed ® and other literature databases contain a wealth of information on diseases and their diagnosis/treatment in the form of scientific publications. In order to take advantage of such rich information, several text-mining tools have been developed for automatically detecting mentions of disease names in the PubMed abstracts. The next important step is the normalization of the various disease names to standardized vocabulary entries and medical dictionaries. To this end, we present an automatic approach for mapping disease names in PubMed abstracts to their corresponding concepts in Medical Subject Headings (MeSH ® ) or Online Mendelian Inheritance in Man (OMIM ® ). For developing our algorithm, we merged disease concept annotations from two existing corpora. In addition, we hand annotated a separate test set of decease concepts for our method evaluation. Different from others, we reformulate the disease name normalization task as an information retrieval task where input queries are disease names and search results are disease concepts. As such, our inference method builds on existing Lucene search and further improves it by taking into account the string similarity of query terms to the disease concept name and synonyms. Evaluation results show that our method compares favorably to other state-of-the-art approaches. In conclusion, we find that our approach is a simple and effective way for linking disease names to controlled vocabularies and that the merged disease corpus provides added value for the development of text mining tools for named entity recognition from biomedical text. Data is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Fellows/Dogan/disease.html
Nov-5-2012
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (0.49)
- Industry:
- Health & Medicine (1.00)
- Technology: