AITopics | clinical text

Collaborating Authors

clinical text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TraceCoder: Towards Traceable ICD Coding via Multi-Source Knowledge Integration

Ren, Mucheng, Chen, He, Yan, Yuchen, Hu, Danqing, Xu, Jun, Zeng, Xian

arXiv.org Artificial IntelligenceNov-12-2025

Automated International Classification of Diseases (ICD) coding assigns standardized diagnosis and procedure codes to clinical records, playing a critical role in healthcare systems. However, existing methods face challenges such as semantic gaps between clinical text and ICD codes, poor performance on rare and long-tail codes, and limited interpretability. To address these issues, we propose TraceCoder, a novel framework integrating multi-source external knowledge to enhance traceability and explainability in ICD coding. TraceCoder dynamically incorporates diverse knowledge sources, including UMLS, Wikipedia, and large language models (LLMs), to enrich code representations, bridge semantic gaps, and handle rare and ambiguous codes. It also introduces a hybrid attention mechanism to model interactions among labels, clinical context, and knowledge, improving long-tail code recognition and making predictions interpretable by grounding them in external evidence. Experiments on MIMIC-III-ICD9, MIMIC-IV-ICD9, and MIMIC-IV-ICD10 datasets demonstrate that TraceCoder achieves state-of-the-art performance, with ablation studies validating the effectiveness of its components. TraceCoder offers a scalable and robust solution for automated ICD coding, aligning with clinical needs for accuracy, interpretability, and reliability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.15267

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.72)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts

Iwase, Naoto, Okuyama, Hiroki, Iwasawa, Junichiro

arXiv.org Artificial IntelligenceNov-4-2025

Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts -- a prerequisite for safe deployment -- remains under-evaluated, particularly beyond English. We introduce MedRECT, a cross-lingual benchmark (Japanese/English) that formulates medical error handling as three subtasks: error detection, error localization (sentence extraction), and error correction. MedRECT is built with a scalable, automated pipeline from the Japanese Medical Licensing Examinations (JMLE) and a curated English counterpart, yielding MedRECT-ja (663 texts) and MedRECT-en (458 texts) with comparable error/no-error balance. We evaluate 9 contemporary LLMs spanning proprietary, open-weight, and reasoning families. Key findings: (i) reasoning models substantially outperform standard architectures, with up to 13.5% relative improvement in error detection and 51.0% in sentence extraction; (ii) cross-lingual evaluation reveals 5-10% performance gaps from English to Japanese, with smaller disparities for reasoning models; (iii) targeted LoRA fine-tuning yields asymmetric improvements in error correction performance (Japanese: +0.078, English: +0.168) while preserving reasoning capabilities; and (iv) our fine-tuned model exceeds human expert performance on structured medical error correction tasks. To our knowledge, MedRECT is the first comprehensive cross-lingual benchmark for medical error correction, providing a reproducible framework and resources for developing safer medical LLMs across languages.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.00421

Country:

North America > United States (0.28)
North America > Mexico (0.28)
Asia > Japan > Honshū (0.28)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Diagnostic Medicine (0.83)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Multilingual Clinical NER for Diseases and Medications Recognition in Cardiology Texts using BERT Embeddings

Danu, Manuela Daniela, Marica, George, Suciu, Constantin, Itu, Lucian Mihai, Farri, Oladimeji

arXiv.org Artificial IntelligenceOct-21-2025

The rapidly increasing volume of electronic health record (EHR) data underscores a pressing need to unlock biomedical knowledge from unstructured clinical texts to support advancements in data-driven clinical systems, including patient diagnosis, disease progression monitoring, treatment effects assessment, prediction of future clinical events, etc. While contextualized language models have demonstrated impressive performance improvements for named entity recognition (NER) systems in English corpora, there remains a scarcity of research focused on clinical texts in low-resource languages. To bridge this gap, our study aims to develop multiple deep contextual embedding models to enhance clinical NER in the cardiology domain, as part of the BioASQ MultiCardioNER shared task. We explore the effectiveness of different monolingual and multilingual BERT-based models, trained on general domain text, for extracting disease and medication mentions from clinical case reports written in English, Spanish, and Italian. We achieved an F1-score of 77.88% on Spanish Diseases Recognition (SDR), 92.09% on Spanish Medications Recognition (SMR), 91.74% on English Medications Recognition (EMR), and 88.9% on Italian Medications Recognition (IMR). These results outperform the mean and median F1 scores in the test leaderboard across all subtasks, with the mean/median values being: 69.61%/75.66% for SDR, 81.22%/90.18% for SMR, 89.2%/88.96% for EMR, and 82.8%/87.76% for IMR.

large language model, machine learning, recognition, (21 more...)

arXiv.org Artificial Intelligence

2510.17437

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FRACCO: A gold-standard annotated corpus of oncological entities with ICD-O-3.1 normalisation

Pignat, Johann, Vucetic, Milena, Gaudet-Blavignac, Christophe, Zaghir, Jamil, Stettler, Amandine, Amrein, Fanny, Bonjour, Jonatan, Goldman, Jean-Philippe, Michielin, Olivier, Lovis, Christian, Bjelogrlic, Mina

arXiv.org Artificial IntelligenceOct-17-2025

Developing natural language processing tools for clinical text requires annotated datasets, yet French oncology resources remain scarce. We present FRACCO (FRench Annotated Corpus for Clinical Oncology) an expert-annotated corpus of 1301 synthetic French clinical cases, initially translated from the Spanish CANTEMIST corpus as part of the FRASIMED initiative. Each document is annotated with terms related to morphology, topography, and histologic differentiation, using the International Classification of Diseases for Oncology (ICD-O) as reference. An additional annotation layer captures composite expression-level normalisations that combine multiple ICD-O elements into unified clinical concepts. Annotation quality was ensured through expert review: 1301 texts were manually annotated for entity spans by two domain experts. A total of 71127 ICD-O normalisations were produced through a combination of automated matching and manual validation by a team of five annotators. The final dataset representing 399 unique morphology codes (from 2549 different expressions), 272 topography codes (from 3143 different expressions), and 2043 unique composite expressions (from 11144 different expressions). This dataset provides a reference standard for named entity recognition and concept normalisation in French oncology texts.

annotation, artificial intelligence, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.13873

Country: Europe (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Understanding Stigmatizing Language Lexicons: A Comparative Analysis in Clinical Contexts

Zhou, Yiliang, Hu, Di, Lyu, Tianchu, Dhillon, Jasmine, Beck, Alexandra L., Sadigh, Gelareh, Zheng, Kai

arXiv.org Artificial IntelligenceSep-10-2025

Stigmatizing language results in healthcare inequities, yet there is no universally accepted or standardized lexicon defining which words, terms, or phrases constitute stigmatizing language in healthcare. We conducted a systematic search of the literature to identify existing stigmatizing language lexicons and then analyzed them comparatively to examine: 1) similarities and discrepancies between these lexicons, and 2) the distribution of positive, negative, or neutral terms based on an established sentiment dataset. Our search identified four lexicons. The analysis results revealed moderate semantic similarity among them, and that most stigmatizing terms are related to judgmental expressions by clinicians to describe perceived negative behaviors. Sentiment analysis showed a predominant proportion of negatively classified terms, though variations exist across lexicons. Our findings underscore the need for a standardized lexicon and highlight challenges in defining stigmatizing language in clinical texts.

artificial intelligence, lexicon, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.07462

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (0.94)
Health & Medicine > Health Care Technology > Medical Record (0.75)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Using LLMs for Multilingual Clinical Entity Linking to ICD-10

Vassileva, Sylvia, Koychev, Ivan, Boytcheva, Svetla

arXiv.org Artificial IntelligenceSep-8-2025

The linking of clinical entities is a crucial part of extracting structured information from clinical texts. It is the process of assigning a code from a medical ontology or classification to a phrase in the text. The International Classification of Diseases - 10th revision (ICD-10) is an international standard for classifying diseases for statistical and insurance purposes. Automatically assigning the correct ICD-10 code to terms in discharge summaries will simplify the work of healthcare professionals and ensure consistent coding in hospitals. Our paper proposes an approach for linking clinical terms to ICD-10 codes in different languages using Large Language Models (LLMs). The approach consists of a multistage pipeline that uses clinical dictionaries to match unambiguous terms in the text and then applies in-context learning with GPT-4.1 to predict the ICD-10 code for the terms that do not match the dictionary. Our system shows promising results in predicting ICD-10 codes on different benchmark datasets in Spanish - 0.89 F1 for categories and 0.78 F1 on subcategories on CodiEsp, and Greek - 0.85 F1 on ElCardioCC.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.04868

Country: Europe (0.29)

Genre: Research Report (0.82)

Industry: Health & Medicine > Health Care Providers & Services (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Fully Transformer Based Multimodal Framework for Explainable Cancer Image Segmentation Using Radiology Reports

Adahada, Enobong, Sassoon, Isabel, Hone, Kate, Li, Yongmin

arXiv.org Artificial IntelligenceAug-20-2025

We introduce Med-CTX, a fully transformer based multimodal framework for explainable breast cancer ultrasound segmentation. We integrate clinical radiology reports to boost both performance and interpretability. Med-CTX achieves exact lesion delineation by using a dual-branch visual encoder that combines ViT and Swin transformers, as well as uncertainty aware fusion. Clinical language structured with BI-RADS semantics is encoded by BioClinicalBERT and combined with visual features utilising cross-modal attention, allowing the model to provide clinically grounded, model generated explanations. Our methodology generates segmentation masks, uncertainty maps, and diagnostic rationales all at once, increasing confidence and transparency in computer assisted diagnosis. On the BUS-BRA dataset, Med-CTX achieves a Dice score of 99% and an IoU of 95%, beating existing baselines U-Net, ViT, and Swin. Clinical text plays a key role in segmentation accuracy and explanation quality, as evidenced by ablation studies that show a -5.4% decline in Dice score and -31% in CIDEr. Med-CTX achieves good multimodal alignment (CLIP score: 85%) and increased confi dence calibration (ECE: 3.2%), setting a new bar for trustworthy, multimodal medical architecture.

large language model, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2508.13796

Country: Europe (0.46)

Genre: Research Report (0.41)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks

Noori, Ali, Devkota, Pratik, Mohanty, Somya, Manda, Prashanti

arXiv.org Artificial IntelligenceAug-5-2025

Automated annotation of clinical text with standardized medical concepts is critical for enabling structured data extraction and decision support. SNOMED CT provides a rich ontology for labeling clinical entities, but manual annotation is labor-intensive and impractical at scale. This study introduces a neural sequence labeling approach for SNOMED CT concept recognition using a Bidirectional GRU model. Leveraging a subset of MIMIC-IV, we preprocess text with domain-adapted SpaCy and SciBERT-based tokenization, segmenting sentences into overlapping 19-token chunks enriched with contextual, syntactic, and morphological features. The Bi-GRU model assigns IOB tags to identify concept spans and achieves strong performance with a 90 percent F1-score on the validation set. These results surpass traditional rule-based systems and match or exceed existing neural models. Qualitative analysis shows effective handling of ambiguous terms and misspellings. Our findings highlight that lightweight RNN-based architectures can deliver high-quality clinical concept annotation with significantly lower computational cost than transformer-based models, making them well-suited for real-world deployment.

annotation, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.02556

Country:

North America > United States > North Carolina (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.31)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

clinical text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

d1ebc73cd4c88866b97a6851ece739d1-Paper-Conference.pdf

TraceCoder: Towards Traceable ICD Coding via Multi-Source Knowledge Integration

MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts

Multilingual Clinical NER for Diseases and Medications Recognition in Cardiology Texts using BERT Embeddings

FRACCO: A gold-standard annotated corpus of oncological entities with ICD-O-3.1 normalisation

d1ebc73cd4c88866b97a6851ece739d1-Paper-Conference.pdf

Understanding Stigmatizing Language Lexicons: A Comparative Analysis in Clinical Contexts

Using LLMs for Multilingual Clinical Entity Linking to ICD-10

A Fully Transformer Based Multimodal Framework for Explainable Cancer Image Segmentation Using Radiology Reports

Automated SNOMED CT Concept Annotation in Clinical Text Using Bi-GRU Neural Networks