Goto

Collaborating Authors

 Liao, Wenxiong


Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis

arXiv.org Artificial Intelligence

The analysis of medical texts is a key component of healthcare informatics, where the accurate comparison and interpretation of documents can significantly impact patient care and medical research. Traditionally, this analysis has leveraged lexical comparison metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [1] and BLEU (Bilingual Evaluation Understudy) [2], which have become standard tools in the evaluation of text similarity within the domain of natural language processing (NLP). ROUGE and BLEU were initially designed to assess the quality of automatic summarization and machine translation respectively, by measuring the overlap of n-grams between the generated texts and reference texts. While these metrics have been instrumental in advancing NLP applications, their application in medical text analysis reveals inherent limitations. Specifically, ROUGE and BLEU focus predominantly on surface-level lexical similarities, often overlooking the deep semantic meanings and clinical implications embedded within medical documents. This gap in capturing the essence and context of medical language presents a significant challenge in leveraging these metrics for meaningful analysis in healthcare. Recognizing these limitations, this research proposes a novel methodology that employs GPT-4, a state-of-the-art large language model, for a more sophisticated analysis of medical texts. GPT-4's advanced understanding of context and semantics [3, 4, 5] offers an opportunity to transcend the boundaries of traditional lexical analysis, enabling a deeper, more meaningful comparison of medical documents [6, 7]. This approach not only addresses the shortcomings of ROUGE and BLEU but also aligns with the evolving needs of medical data analysis, where the accurate interpretation of texts is preeminent.


The Radiation Oncology NLP Database

arXiv.org Artificial Intelligence

We present the Radiation Oncology NLP Database (ROND), the first dedicated Natural Language Processing (NLP) dataset for radiation oncology, an important medical specialty that has received limited attention from the NLP community in the past. With the advent of Artificial General Intelligence (AGI), there is an increasing need for specialized datasets and benchmarks to facilitate research and development. ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration. It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations, each with a distinct focus on radiation oncology concepts and application cases. In addition, we have developed an instruction-tuning dataset consisting of over 20k instruction pairs (based on ROND) and trained a large language model, CancerChat. This serves to demonstrate the potential of instruction-tuning large language models within a highly-specialized medical domain. The evaluation results in this study could serve as baseline results for future research. ROND aims to stimulate advancements in radiation oncology and clinical NLP by offering a platform for testing and improving algorithms and models in a domain-specific context. The ROND dataset is a joint effort of multiple U.S. health institutions. The data is available at https://github.com/zl-liu/Radiation-Oncology-NLP-Database.


Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

arXiv.org Artificial Intelligence

Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.


Differentiate ChatGPT-generated and Human-written Medical Texts

arXiv.org Artificial Intelligence

Background: Large language models such as ChatGPT are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the Internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public. Objective: This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT, and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT. Results: Medical texts written by humans are more concrete, more diverse, and typically contain more useful information, while medical texts generated by ChatGPT pay more attention to fluency and logic, and usually express general terminologies rather than effective information specific to the context of the problem. A BERT-based model can effectively detect medical texts generated by ChatGPT, and the F1 exceeds 95%.


AugGPT: Leveraging ChatGPT for Text Data Augmentation

arXiv.org Artificial Intelligence

Text data augmentation is an effective strategy for overcoming the challenge of limited sample sizes in many natural language processing (NLP) tasks. This challenge is especially prominent in the few-shot learning scenario, where the data in the target domain is generally much scarcer and of lowered quality. A natural and widely-used strategy to mitigate such challenges is to perform data augmentation to better capture the data invariance and increase the sample size. However, current text data augmentation methods either can't ensure the correct labeling of the generated data (lacking faithfulness) or can't ensure sufficient diversity in the generated data (lacking compactness), or both. Inspired by the recent success of large language models, especially the development of ChatGPT, which demonstrated improved language comprehension abilities, in this work, we propose a text data augmentation approach based on ChatGPT (named AugGPT). AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples. The augmented samples can then be used in downstream model training. Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach over state-of-the-art text data augmentation methods in terms of testing accuracy and distribution of the augmented samples.


Coarse-to-fine Knowledge Graph Domain Adaptation based on Distantly-supervised Iterative Training

arXiv.org Artificial Intelligence

Modern supervised learning neural network models require a large amount of manually labeled data, which makes the construction of domain-specific knowledge graphs time-consuming and labor-intensive. In parallel, although there has been much research on named entity recognition and relation extraction based on distantly supervised learning, constructing a domain-specific knowledge graph from large collections of textual data without manual annotations is still an urgent problem to be solved. In response, we propose an integrated framework for adapting and re-learning knowledge graphs from one coarse domain (biomedical) to a finer-define domain (oncology). In this framework, we apply distant-supervision on cross-domain knowledge graph adaptation. Consequently, no manual data annotation is required to train the model. We introduce a novel iterative training strategy to facilitate the discovery of domain-specific named entities and triples. Experimental results indicate that the proposed framework can perform domain adaptation and construction of knowledge graph efficiently.


Mask-guided BERT for Few Shot Text Classification

arXiv.org Artificial Intelligence

Transformer-based language models have achieved significant success in various domains. However, the data-intensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting. Here we present Mask-BERT, a simple and modular framework to help BERT-based architectures tackle FSL. The proposed approach fundamentally differs from existing FSL strategies such as prompt tuning and meta-learning. The core idea is to selectively apply masks on text inputs and filter out irrelevant information, which guides the model to focus on discriminative tokens that influence prediction results. In addition, to make the text representations from different categories more separable and the text representations from the same category more compact, we introduce a contrastive learning loss function. Experimental results on public-domain benchmark datasets demonstrate the effectiveness of Mask-BERT.


Joint Intent Detection and Slot Filling with Wheel-Graph Attention Networks

arXiv.org Artificial Intelligence

Multiple deep learning-based joint models have demonstrated excellent results on Table 1: An example with intent and slot annotation the two tasks. In this paper, we propose a new joint (BIO format), which indicates the slot of movie name model with a wheel-graph attention network (Wheel-from an utterance with an intent PlayMusic. GAT) which is able to model interrelated connections directly for intent detection and slot filling. To construct a graph structure for utterances, we create intent The SLU module takesuser utterance as input and performs nodes, slot nodes, and directed edges. Intent nodes three tasks: domain determination, intent detection, can provide utterance-level semantic information for and slot filling [11]. Among them, the first two slot filling, while slot nodes can also provide local keyword tasks are often framed as a classification problem, which information for intent. Experiments show that infers the domain or intent (from a predefined set of our model outperforms multiple baselines on two public candidates) based on the current user utterance [27].