medical text
Towards Domain Specification of Embedding Models in Medicine
Khodadad, Mohammad, Kasmaee, Ali Shiraee, Astaraki, Mahdi, Mahyar, Hamidreza
Medical text embedding models are foundational to a wide array of healthcare applications, ranging from clinical decision support and biomedical information retrieval to medical question answering, yet they remain hampered by two critical shortcomings. First, most models are trained on a narrow slice of medical and biological data, beside not being up to date in terms of methodology, making them ill suited to capture the diversity of terminology and semantics encountered in practice. Second, existing evaluations are often inadequate: even widely used benchmarks fail to generalize across the full spectrum of real world medical tasks. To address these gaps, we leverage MEDTE, a GTE model extensively fine-tuned on diverse medical corpora through self-supervised contrastive learning across multiple data sources, to deliver robust medical text embeddings. Alongside this model, we propose a comprehensive benchmark suite of 51 tasks spanning classification, clustering, pair classification, and retrieval modeled on the Massive Text Embedding Benchmark (MTEB) but tailored to the nuances of medical text. Our results demonstrate that this combined approach not only establishes a robust evaluation framework but also yields embeddings that consistently outperform state of the art alternatives in different tasks.
- North America > Canada > Ontario > Hamilton (0.14)
- North America > United States (0.14)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Research on Medical Named Entity Identification Based On Prompt-Biomrc Model and Its Application in Intelligent Consultation System
This study is dedicated to exploring the application of prompt learning methods to advance Named Entity Recognition (NER) within the medical domain. In recent years, the emergence of large-scale models has driven significant progress in NER tasks, particularly with the introduction of the BioBERT language model, which has greatly enhanced NER capabilities in medical texts. Our research introduces the Prompt-bioMRC model, which integrates both hard template and soft prompt designs aimed at refining the precision and efficiency of medical entity recognition. Through extensive experimentation across diverse medical datasets, our findings consistently demonstrate that our approach surpasses traditional models. This enhancement not only validates the efficacy of our methodology but also highlights its potential to provide reliable technological support for applications like intelligent diagnosis systems. By leveraging advanced NER techniques, this study contributes to advancing automated medical data processing, facilitating more accurate medical information extraction, and supporting efficient healthcare decision-making processes.
- North America > United States (0.04)
- Asia > China (0.04)
Medalyze: Lightweight Medical Report Summarization Application Using FLAN-T5-Large
Nguyen, Van-Tinh, Pham, Hoang-Duong, To, Thanh-Hai, Do, Cong-Tuan Hung, Dong, Thi-Thu-Trang, Le, Vu-Trung Duong, Hoang, Van-Phuc
Understanding medical texts presents significant challenges due to complex terminology and context-specific language. This paper introduces Medalyze, an AI-powered application designed to enhance the comprehension of medical texts using three specialized FLAN-T5-Large models. These models are fine-tuned for (1) summarizing medical reports, (2) extracting health issues from patient-doctor conversations, and (3) identifying the key question in a passage. Medalyze is deployed across a web and mobile platform with real-time inference, leveraging scalable API and YugabyteDB. Experimental evaluations demonstrate the system's superior summarization performance over GPT-4 in domain-specific tasks, based on metrics like BLEU, ROUGE-L, BERTScore, and SpaCy Similarity. Medalyze provides a practical, privacy-preserving, and lightweight solution for improving information accessibility in healthcare.
- Asia > Vietnam > Hanoi > Hanoi (0.15)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Nagasaki Prefecture > Nagasaki (0.04)
- (5 more...)
- Workflow (1.00)
- Research Report (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- (3 more...)
Semantic Textual Similarity Assessment in Chest X-ray Reports Using a Domain-Specific Cosine-Based Metric
Picha, Sayeh Gholipour, Chanti, Dawood Al, Caplier, Alice
Medical language processing and deep learning techniques have emerged as critical tools for improving healthcare, particularly in the analysis of medical imaging and medical text data. These multimodal data fusion techniques help to improve the interpretation of medical imaging and lead to increased diagnostic accuracy, informed clinical decisions, and improved patient outcomes. The success of these models relies on the ability to extract and consolidate semantic information from clinical text. This paper addresses the need for more robust methods to evaluate the semantic content of medical reports. Conventional natural language processing approaches and metrics are initially designed for considering the semantic context in the natural language domain and machine translation, often failing to capture the complex semantic meanings inherent in medical content. In this study, we introduce a novel approach designed specifically for assessing the semantic similarity between generated medical reports and the ground truth. Our approach is validated, demonstrating its efficiency in assessing domain-specific semantic similarity within medical contexts. By applying our metric to state-of-the-art Chest X-ray report generation models, we obtain results that not only align with conventional metrics but also provide more contextually meaningful scores in the considered medical domain.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.07)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (3 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization
Sun, Dan, He, Jacky, Zhang, Hanlu, Qi, Zhen, Zheng, Hongye, Wang, Xiaokai
This paper proposes a medical text summarization method based on LongFormer, aimed at addressing the challenges faced by existing models when processing long medical texts. Traditional summarization methods are often limited by short-term memory, leading to information loss or reduced summary quality in long texts. LongFormer, by introducing long-range self-attention, effectively captures long-range dependencies in the text, retaining more key information and improving the accuracy and information retention of summaries. Experimental results show that the LongFormer-based model outperforms traditional models, such as RNN, T5, and BERT in automatic evaluation metrics like ROUGE. It also receives high scores in expert evaluations, particularly excelling in information retention and grammatical accuracy. However, there is still room for improvement in terms of conciseness and readability. Some experts noted that the generated summaries contain redundant information, which affects conciseness. Future research will focus on further optimizing the model structure to enhance conciseness and fluency, achieving more efficient medical text summarization. As medical data continues to grow, automated summarization technology will play an increasingly important role in fields such as medical research, clinical decision support, and knowledge management.
- North America > United States (0.15)
- Asia > China (0.14)
Iterative Tree Analysis for Medical Critics
Huang, Zenan, Li, Mingwei, Zhou, Zheng, Jiang, Youxin
Large Language Models (LLMs) have been widely adopted across various domains, yet their application in the medical field poses unique challenges, particularly concerning the generation of hallucinations. Hallucinations in open-ended long medical text manifest as misleading critical claims, which are difficult to verify due to two reasons. First, critical claims are often deeply entangled within the text and cannot be extracted based solely on surface-level presentation. Second, verifying these claims is challenging because surface-level token-based retrieval often lacks precise or specific evidence, leaving the claims unverifiable without deeper mechanism-based analysis. In this paper, we introduce a novel method termed Iterative Tree Analysis (ITA) for medical critics. ITA is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process. This process involves a combination of top-down task decomposition and bottom-up evidence consolidation, enabling precise verification of complex medical claims through detailed mechanism-level reasoning. Our extensive experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%. Additionally, we will release a comprehensive test set to the public, aiming to foster further advancements in research within this domain.
Accurate Medical Named Entity Recognition Through Specialized NLP Models
Hu, Jiacheng, Bao, Runyuan, Lin, Yang, Zhang, Hanchao, Xiang, Yanlin
This study evaluated the effect of BioBERT in medical text processing for the task of medical named entity recognition. Through comparative experiments with models such as BERT, ClinicalBERT, SciBERT, and BlueBERT, the results showed that BioBERT achieved the best performance in both precision and F1 score, verifying its applicability and superiority in the medical field. BioBERT enhances its ability to understand professional terms and complex medical texts through pre-training on biomedical data, providing a powerful tool for medical information extraction and clinical decision support. The study also explored the privacy and compliance challenges of BioBERT when processing medical data, and proposed future research directions for combining other medical-specific models to improve generalization and robustness. With the development of deep learning technology, the potential of BioBERT in application fields such as intelligent medicine, personalized treatment, and disease prediction will be further expanded. Future research can focus on the real-time and interpretability of the model to promote its widespread application in the medical field.
- North America > United States > New York (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East > Israel (0.04)
LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation
Keles, Bunyamin, Gunay, Murat, Caglar, Serdar I.
Machine translation is indispensable in healthcare for enabling the global dissemination of medical knowledge across languages. However, complex medical terminology poses unique challenges to achieving adequate translation quality and accuracy. This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized specifically for medical texts. While large language models (LLMs) have demonstrated powerful capabilities, this research shows that small, specialized models trained on high-quality in-domain (mostly synthetic) data can outperform even vastly larger LLMs. Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts. Our LLMs-in-the-loop methodology employs synthetic data generation, rigorous evaluation, and agent orchestration to enhance performance. We developed small medical translation models using the MarianMT base model. We introduce a new medical translation test dataset to standardize evaluation in this domain. Assessed using BLEU, METEOR, ROUGE, and BERT scores on this test set, our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo. Results demonstrate that our LLMs-in-the-loop approach, combined with fine-tuning high-quality, domain-specific data, enables specialized models to outperform general-purpose and some larger systems. This research, part of a broader series on expert small models, paves the way for future healthcare-related AI developments, including deidentification and bio-medical entity extraction models. Our study underscores the potential of tailored neural translation models and the LLMs-in-the-loop methodology to advance the field through improved data generation, evaluation, agent, and modeling techniques.
- Europe > Finland > Uusimaa > Helsinki (0.05)
- North America > United States (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Causality extraction from medical text using Large Language Models (LLMs)
Gopalakrishnan, Seethalakshmi, Garbayo, Luciana, Zadrozny, Wlodek
This study explores the potential of natural language models, including large language models, to extract causal relations from medical texts, specifically from Clinical Practice Guidelines (CPGs). The outcomes causality extraction from Clinical Practice Guidelines for gestational diabetes are presented, marking a first in the field. We report on a set of experiments using variants of BERT (BioBERT, DistilBERT, and BERT) and using Large Language Models (LLMs), namely GPT-4 and LLAMA2. Our experiments show that BioBERT performed better than other models, including the Large Language Models, with an average F1-score of 0.72. GPT-4 and LLAMA2 results show similar performance but less consistency. We also release the code and an annotated a corpus of causal statements within the Clinical Practice Guidelines for gestational diabetes.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.14)
- North America > United States > North Carolina (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology
Kulyabin, Mikhail, Sokolov, Gleb, Galaida, Aleksandr, Maier, Andreas, Arias-Vergara, Tomas
The extraction and analysis of insights from medical data, primarily stored in free-text formats by healthcare workers, presents significant challenges due to its unstructured nature. Medical coding, a crucial process in healthcare, remains minimally automated due to the complexity of medical ontologies and restricted access to medical texts for training Natural Language Processing models. In this paper, we proposed a method, "SNOBERT," of linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models. The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes. SNOBERT outperforms other classical methods based on deep learning, as confirmed by the results of a challenge in which it was applied.
- North America > United States > Massachusetts (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- (2 more...)