disease name
Data Augmentation Techniques for Chinese Disease Name Normalization
Cui, Wenqian, Fu, Xiangling, Liu, Shaohui, Gu, Mingjun, Liu, Xien, Wu, Ji, King, Irwin
Disease name normalization is an important task in the medical domain. It classifies disease names written in various formats into standardized names, serving as a fundamental component in smart healthcare systems for various disease-related functions. Nevertheless, the most significant obstacle to existing disease name normalization systems is the severe shortage of training data. Consequently, we present a novel data augmentation approach that includes a series of data augmentation techniques and some supporting modules to help mitigate the problem. Through extensive experimentation, we illustrate that our proposed approach exhibits significant performance improvements across various baseline models and training objectives, particularly in scenarios with limited training data
DisEmbed: Transforming Disease Understanding through Embeddings
When it comes to understanding diseases, many existing models, such as ClinicalBERT and BioBERT, struggle due to their broad generalization across the medical domain. While these models perform well in general healthcare contexts, they often fail to capture the nuanced relationships between specific diseases and their symptoms. For example, in use cases like Clinical Decision Support, disease diagnosis systems, and disease categorization based on symptoms, these models fall short. They can identify that a given text is related to the medical field, but they often do not understand whether the entities in the text are directly related. For instance, while both "brain surgery" and "parkinson's disease" are medical terms, a medical/general model might mistakenly associate them because it treats both as medical concepts, leading to high cosine similarity, even though they are unrelated. To address this gap, I have curated a synthetic dataset focused solely on diseases, where the descriptions and symptoms are not explicitly labeled with symptom names. This forces the model to learn deeper and more precise associations and not rely solely on superficial medical terminology. Although there is an inherent understanding of the correlations between symptoms and diseases, this approach promotes a more focused and accurate understanding of the disease.
Towards System Modelling to Support Diseases Data Extraction from the Electronic Health Records for Physicians Research Activities
Alsaqer, Bushra F., Alsaqer, Alaa F., Asif, Amna
The use of Electronic Health Records (EHRs) has increased dramatically in the past 15 years, as, it is considered an important source of managing data od patients. The EHRs are primary sources of disease diagnosis and demographic data of patients worldwide. Therefore, the data can be utilized for secondary tasks such as research. This paper aims to make such data usable for research activities such as monitoring disease statistics for a specific population. As a result, the researchers can detect the disease causes for the behavior and lifestyle of the target group. One of the limitations of EHRs systems is that the data is not available in the standard format but in various forms. Therefore, it is required to first convert the names of the diseases and demographics data into one standardized form to make it usable for research activities. There is a large amount of EHRs available, and solving the standardizing issues requires some optimized techniques. We used a first-hand EHR dataset extracted from EHR systems. Our application uploads the dataset from the EHRs and converts it to the ICD-10 coding system to solve the standardization problem. So, we first apply the steps of pre-processing, annotation, and transforming the data to convert it into the standard form. The data pre-processing is applied to normalize demographic formats. In the annotation step, a machine learning model is used to recognize the diseases from the text. Furthermore, the transforming step converts the disease name to the ICD-10 coding format. The model was evaluated manually by comparing its performance in terms of disease recognition with an available dictionary-based system (MetaMap). The accuracy of the proposed machine learning model is 81%, that outperformed MetaMap accuracy of 67%. This paper contributed to system modelling for EHR data extraction to support research activities.
- North America > United States (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Asia > Middle East > Saudi Arabia > Eastern Province > Al-Ahsa Governorate > Al-Hofuf (0.04)
- Africa > Kenya (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.31)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
- Information Technology > Data Science > Data Mining > Text Mining (0.84)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.68)
From Keywords to Structured Summaries: Streamlining Scholarly Knowledge Access
Shamsabadi, Mahsa, D'Souza, Jennifer
This short paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how researchers access and filter articles, replacing the traditional text-heavy approach. This vision is exemplified through a proof of concept centered on the ``reproductive number estimate of infectious diseases'' research theme, using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords. The result is a next-generation IR method accessible at https://orkg.org/usecases/r0-estimates.
- South America > Colombia (0.04)
- Europe > Slovenia (0.04)
- Europe > France (0.04)
- (21 more...)
- Overview (0.88)
- Research Report (0.83)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology
Shamsabadi, Mahsa, D'Souza, Jennifer, Auer, Sören
In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs' emergent abilities. For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning approach with a simpler objective expressed through instructions. Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
- Asia > China > Guangdong Province > Shenzhen (0.07)
- Oceania > Australia (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization
Cui, Wenqian, Liu, Shaohui, Fu, Xiangling, Liu, Xien, Wu, Ji
The disease is a core concept in the medical field, and the task of normalizing disease names is the basis of all disease-related tasks. However, due to the multi-axis and multi-grain nature of disease names, incorrect information is often injected and harms the performance when using general text data augmentation techniques. To address the above problem, we propose a set of data augmentation techniques that work together as an augmented training task for disease normalization. Our data augmentation methods are based on both the clinical disease corpus and standard disease corpus derived from ICD-10 coding. Extensive experiments are conducted to show the effectiveness of our proposed methods. The results demonstrate that our methods can have up to 3\% performance gain compared to non-augmented counterparts, and they can work even better on smaller datasets.
Is In-hospital Meta-information Useful for Abstractive Discharge Summary Generation?
Ando, Kenichiro, Komachi, Mamoru, Okumura, Takashi, Horiguchi, Hiromasa, Matsumoto, Yuji
During the patient's hospitalization, the physician must record daily observations of the patient and summarize them into a brief document called "discharge summary" when the patient is discharged. Automated generation of discharge summary can greatly relieve the physicians' burden, and has been addressed recently in the research community. Most previous studies of discharge summary generation using the sequence-to-sequence architecture focus on only inpatient notes for input. However, electric health records (EHR) also have rich structured metadata (e.g., hospital, physician, disease, length of stay, etc.) that might be useful. This paper investigates the effectiveness of medical meta-information for summarization tasks. We obtain four types of meta-information from the EHR systems and encode each meta-information into a sequence-to-sequence model. Using Japanese EHRs, meta-information encoded models increased ROUGE-1 by up to 4.45 points and BERTScore by 3.77 points over the vanilla Longformer. Also, we found that the encoded meta-information improves the precisions of its related terms in the outputs. Our results showed the benefit of the use of medical meta-information.
Medical Entity Linking using Triplet Network
Mondal, Ishani, Purkayastha, Sukannya, Sarkar, Sudeshna, Goyal, Pawan, Pillai, Jitesh, Bhattacharyya, Amitava, Gattu, Mahanandeeshwar
Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is executed in two phases: candidate generation and candidate scoring. In this paper, we present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention. We make use of the Triplet Network for candidate ranking. While the existing methods have used carefully generated sieves and external resources for candidate generation, we introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules. Experimental results on the standard benchmark NCBI disease dataset demonstrate that our system outperforms the prior methods by a significant margin.
Disease Normalization with Graph Embeddings
Pujary, Dhruba, Thorne, Camilo, Aziz, Wilker
The detection and normalization of diseases in biomedical texts are key biomedical natural language processing tasks. Disease names need not only be identified, but also normalized or linked to clinical taxonomies describing diseases such as MeSH. In this paper we describe deep learning methods that tackle both tasks. We train and test our methods on the known NCBI disease benchmark corpus. We propose to represent disease names by leveraging MeSH's graphical structure together with the lexical information available in the taxonomy using graph embeddings. We also show that combining neural named entity recognition models with our graph-based entity linking methods via multitask learning leads to improved disease recognition in the NCBI corpus.