Goto

Collaborating Authors

 text analysis


Textarium: Entangling Annotation, Abstraction and Argument

Proff, Philipp, Dörk, Marian

arXiv.org Artificial Intelligence

We present a web-based environment that connects annotation, abstraction, and argumentation during the interpretation of text. As a visual interface for scholarly reading and writing, Textarium combines human analysis with lightweight computational processing to bridge close and distant reading practices. Readers can highlight text, group keywords into concepts, and embed these observations as anchors in essays. The interface renders these interpretive actions as parameterized visualization states. Through a speculative design process of co-creative and iterative prototyping, we developed a reading-writing approach that makes interpretive processes transparent and shareable within digital narratives.


ModernBERT is More Efficient than Conventional BERT for Chest CT Findings Classification in Japanese Radiology Reports

Yamagishi, Yosuke, Kikuchi, Tomohiro, Hanaoka, Shouhei, Yoshikawa, Takeharu, Abe, Osamu

arXiv.org Artificial Intelligence

Objective: This study aims to evaluate and compare the performance of two Japanese language models-conventional Bidirectional Encoder Representations from Transformers (BERT) and the newer ModernBERT-in classifying findings from chest CT reports, with a focus on tokenization efficiency, processing time, and classification performance. Methods: We conducted a retrospective study using the CT-RATE-JPN dataset containing 22,778 training reports and 150 test reports. Both models were fine-tuned for multi-label classification of 18 common chest CT conditions. The training data was split in 18,222:4,556 for training and validation. Performance was evaluated using F1 scores for each condition and exact match accuracy across all 18 labels. Results: ModernBERT demonstrated superior tokenization efficiency, requiring 24.0% fewer tokens per document (258.1 vs. 339.6) compared to BERT Base. This translated to significant performance improvements, with ModernBERT completing training in 1877.67 seconds versus BERT's 3090.54 seconds (39% reduction). ModernBERT processed 38.82 samples per second during training (1.65x faster) and 139.90 samples per second during inference (1.66x faster). Despite these efficiency gains, classification performance remained comparable, with ModernBERT achieving superior F1 scores in 8 conditions, while BERT performed better in 4 conditions. Overall exact match accuracy was slightly higher for ModernBERT (74.67% vs. 72.67%), though this difference was not statistically significant (p=0.6291). Conclusion: ModernBERT offers substantial improvements in tokenization efficiency and training speed without sacrificing classification performance. These results suggest that ModernBERT is a promising candidate for clinical applications in Japanese radiology reports analysis.


Prediction of Item Difficulty for Reading Comprehension Items by Creation of Annotated Item Repository

Kapoor, Radhika, Truong, Sang T., Haber, Nick, Ruiz-Primo, Maria Araceli, Domingue, Benjamin W.

arXiv.org Artificial Intelligence

Prediction of item difficulty based on its text content is of substantial interest. In this paper, we focus on the related problem of recovering IRT-based difficulty when the data originally reported item p-value (percent correct responses). We model this item difficulty using a repository of reading passages and student data from US standardized tests from New York and Texas for grades 3-8 spanning the years 2017-23. This repository is annotated with meta-data on (1) linguistic features of the reading items, (2) test features of the passage, and (3) context features. A penalized regression prediction model with all these features can predict item difficulty with RMSE 0.52 compared to baseline RMSE of 0.92, and with a correlation of 0.77 between true and predicted difficulty. We supplement these features with embeddings from LLMs (ModernBERT, BERT, and LlAMA), which marginally improve item difficulty prediction. When models use only item linguistic features or LLM embeddings, prediction performance is similar, which suggests that only one of these feature categories may be required. This item difficulty prediction model can be used to filter and categorize reading items and will be made publicly available for use by other stakeholders.


A Methodology for Studying Linguistic and Cultural Change in China, 1900-1950

Stewart, Spencer Dean

arXiv.org Artificial Intelligence

This paper presents a quantitative approach to studying linguistic and cultural change in China during the first half of the twentieth century, a period that remains understudied in computational humanities research. The dramatic changes in Chinese language and culture during this time call for greater reflection on the tools and methods used for text analysis. This preliminary study offers a framework for analyzing Chinese texts from the late nineteenth and twentieth centuries, demonstrating how established methods such as word counts and word embeddings can provide new historical insights into the complex negotiations between Western modernity and Chinese cultural discourse.


Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing

Nanyonga, Aziida, Wasswa, Hassan, Turhan, Ugur, Joiner, Keith, Wild, Graham

arXiv.org Artificial Intelligence

Improvements in aviation safety analysis call for innovative techniques to extract valuable insights from the abundance of textual data available in accident reports. This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF), to dissect aviation incident narratives using the Australian Transport Safety Bureau (ATSB) dataset. The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights. Through a comparative analysis, this research not only showcases the potential of these methods in aviation safety but also elucidates their distinct advantages and limitations.


Selecting Between BERT and GPT for Text Classification in Political Science Research

Wang, Yu, Qu, Wen, Ye, Xin

arXiv.org Artificial Intelligence

Political scientists often grapple with data scarcity in text classification. Recently, fine-tuned BERT models and their variants have gained traction as effective solutions to address this issue. In this study, we investigate the potential of GPT-based models combined with prompt engineering as a viable alternative. We conduct a series of experiments across various classification tasks, differing in the number of classes and complexity, to evaluate the effectiveness of BERT-based versus GPT-based models in low-data scenarios. Our findings indicate that while zero-shot and few-shot learning with GPT models provide reasonable performance and are well-suited for early-stage research exploration, they generally fall short - or, at best, match - the performance of BERT fine-tuning, particularly as the training set reaches a substantial size (e.g., 1,000 samples). We conclude by comparing these approaches in terms of performance, ease of use, and cost, providing practical guidance for researchers facing data limitations. Our results are particularly relevant for those engaged in quantitative text analysis in low-resource settings or with limited labeled data.


Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use

Jiang, Hang, Beeferman, Doug, Brannon, William, Heyward, Andrew, Roy, Deb

arXiv.org Artificial Intelligence

Words often carry different meanings for people from diverse backgrounds. Today's era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views. The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model. These summaries highlight how the terms are used distinctively by Republicans and Democrats. Additionally, the Bridging Dictionary offers an interactive interface that lets users explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides. We present a use case for journalists and emphasize the importance of human agency and trust in further enhancing this tool. The deployed version of Bridging Dictionary is available at https://dictionary.ccc-mit.org/.


What is Sentiment Meant to Mean to Language Models?

Burnham, Michael

arXiv.org Artificial Intelligence

Sentiment analysis is one of the most widely used techniques in text analysis. Recent advancements with Large Language Models have made it more accurate and accessible than ever, allowing researchers to classify text with only a plain English prompt. However, "sentiment" entails a wide variety of concepts depending on the domain and tools used. It has been used to mean emotion, opinions, market movements, or simply a general ``good-bad'' dimension. This raises a question: What exactly are language models doing when prompted to label documents by sentiment? This paper first overviews how sentiment is defined across different contexts, highlighting that it is a confounded measurement construct in that it entails multiple variables, such as emotional valence and opinion, without disentangling them. I then test three language models across two data sets with prompts requesting sentiment, valence, and stance classification. I find that sentiment labels most strongly correlate with valence labels. I further find that classification improves when researchers more precisely specify their dimension of interest rather than using the less well-defined concept of sentiment. I conclude by encouraging researchers to move beyond "sentiment" when feasible and use a more precise measurement construct.


Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chinese

Chen, Yuqi, Li, Sixuan, Li, Ying, Atari, Mohammad

arXiv.org Artificial Intelligence

In this work, we develop a pipeline for historical-psychological text analysis in classical Chinese. Humans have produced texts in various languages for thousands of years; however, most of the computational literature is focused on contemporary languages and corpora. The emerging field of historical psychology relies on computational techniques to extract aspects of psychology from historical corpora using new methods developed in natural language processing (NLP). The present pipeline, called Contextualized Construct Representations (CCR), combines expert knowledge in psychometrics (i.e., psychological surveys) with text representations generated via transformer-based language models to measure psychological constructs such as traditionalism, norm strength, and collectivism in classical Chinese corpora. Considering the scarcity of available data, we propose an indirect supervised contrastive learning approach and build the first Chinese historical psychology corpus (C-HI-PSY) to fine-tune pre-trained models. We evaluate the pipeline to demonstrate its superior performance compared with other approaches. The CCR method outperforms word-embedding-based approaches across all of our tasks and exceeds prompting with GPT-4 in most tasks. Finally, we benchmark the pipeline against objective, external data to further verify its validity.


Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis

Xu, Shaochen, Wu, Zihao, Zhao, Huaqin, Shu, Peng, Liu, Zhengliang, Liao, Wenxiong, Li, Sheng, Sikora, Andrea, Liu, Tianming, Li, Xiang

arXiv.org Artificial Intelligence

The analysis of medical texts is a key component of healthcare informatics, where the accurate comparison and interpretation of documents can significantly impact patient care and medical research. Traditionally, this analysis has leveraged lexical comparison metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [1] and BLEU (Bilingual Evaluation Understudy) [2], which have become standard tools in the evaluation of text similarity within the domain of natural language processing (NLP). ROUGE and BLEU were initially designed to assess the quality of automatic summarization and machine translation respectively, by measuring the overlap of n-grams between the generated texts and reference texts. While these metrics have been instrumental in advancing NLP applications, their application in medical text analysis reveals inherent limitations. Specifically, ROUGE and BLEU focus predominantly on surface-level lexical similarities, often overlooking the deep semantic meanings and clinical implications embedded within medical documents. This gap in capturing the essence and context of medical language presents a significant challenge in leveraging these metrics for meaningful analysis in healthcare. Recognizing these limitations, this research proposes a novel methodology that employs GPT-4, a state-of-the-art large language model, for a more sophisticated analysis of medical texts. GPT-4's advanced understanding of context and semantics [3, 4, 5] offers an opportunity to transcend the boundaries of traditional lexical analysis, enabling a deeper, more meaningful comparison of medical documents [6, 7]. This approach not only addresses the shortcomings of ROUGE and BLEU but also aligns with the evolving needs of medical data analysis, where the accurate interpretation of texts is preeminent.