content analysis
Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research
Kravets-Meinke, Daria, Schmid-Petri, Hannah, Niemann, Sonja, Schmid, Ute
Generative Large Language Models (gLLMs), such as ChatGPT, are increasingly being used in communication research for content analysis. Studies show that gLLMs can outperform both crowd workers and trained coders, such as research assistants, on various coding tasks relevant to communication science, often at a fraction of the time and cost. Additionally, gLLMs can decode implicit meanings and contextual information, be instructed using natural language, deployed with only basic programming skills, and require little to no annotated data beyond a validation dataset - constituting a paradigm shift in automated content analysis. Despite their potential, the integration of gLLMs into the methodological toolkit of communication research remains underdeveloped. In gLLM-assisted quantitative content analysis, researchers must address at least seven critical challenges that impact result quality: (1) codebook development, (2) prompt engineering, (3) model selection, (4) parameter tuning, (5) iterative refinement, (6) validation of the model's reliability, and optionally, (7) performance enhancement. This paper synthesizes emerging research on gLLM-assisted quantitative content analysis and proposes a comprehensive best-practice guide to navigate these challenges. Our goal is to make gLLM-based content analysis more accessible to a broader range of communication researchers and ensure adherence to established disciplinary quality standards of validity, reliability, reproducibility, and research ethics.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Hierarchical Error Framework for Reliable Automated Coding in Communication Research: Applications to Health and Political Communication
Automated content analysis increasingly supports communication research, yet scaling manual coding into computational pipelines raises concerns about measurement reliability and validity. We introduce a Hierarchical Error Correction (HEC) framework that treats model failures as layered measurement errors (knowledge gaps, reasoning limitations, and complexity constraints) and targets the layers that most affect inference. The framework implements a three-phase methodology: systematic error profiling across hierarchical layers, targeted intervention design matched to dominant error sources, and rigorous validation with statistical testing. Evaluating HEC across health communication (medical specialty classification) and political communication (bias detection), and legal tasks, we validate the approach with five diverse large language models. Results show average accuracy gains of 11.2 percentage points (p < .001, McNemar's test) and stable conclusions via reduced systematic misclassification. Cross-model validation demonstrates consistent improvements (range: +6.8 to +14.6pp), with effectiveness concentrated in moderate-to-high baseline tasks (50-85% accuracy). A boundary study reveals diminished returns in very high-baseline (>85%) or precision-matching tasks, establishing applicability limits. We map layered errors to threats to construct and criterion validity and provide a transparent, measurement-first blueprint for diagnosing error profiles, selecting targeted interventions, and reporting reliability/validity evidence alongside accuracy. This applies to automated coding across communication research and the broader social sciences.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Macao (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Law (0.95)
- Health & Medicine > Therapeutic Area (0.46)
Echoes of Humanity: Exploring the Perceived Humanness of AI Music
Figueiredo, Flavio, Martinelli, Giovanni, Sousa, Henrique, Rodrigues, Pedro, Pedrosa, Frederico, Ferreira, Lucas N.
Recent advances in AI music (AIM) generation services are currently transforming the music industry. Given these advances, understanding how humans perceive AIM is crucial both to educate users on identifying AIM songs, and, conversely, to improve current models. We present results from a listener-focused experiment aimed at understanding how humans perceive AIM. In a blind, Turing-like test, participants were asked to distinguish, from a pair, the AIM and human-made song. We contrast with other studies by utilizing a randomized controlled crossover trial that controls for pairwise similarity and allows for a causal interpretation. We are also the first study to employ a novel, author-uncontrolled dataset of AIM songs from real-world usage of commercial models (i.e., Suno). We establish that listeners' reliability in distinguishing AIM causally increases when pairs are similar. Lastly, we conduct a mixed-methods content analysis of listeners' free-form feedback, revealing a focus on vocal and technical cues in their judgments.
- South America > Brazil > Minas Gerais (0.04)
- North America > United States (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Empowering Computing Education Researchers Through LLM-Assisted Content Analysis
Gale, Laurie, Nicolajsen, Sebastian Mateos
Computing education research (CER) is often instigated by practitioners wanting to improve both their own and the wider discipline's teaching practice. However, the latter is often difficult as many researchers lack the colleagues, resources, or capacity to conduct research that is generalisable or rigorous enough to advance the discipline. As a result, research methods that enable sense-making with larger volumes of qualitative data, while not increasing the burden on the researcher, have significant potential within CER. In this discussion paper, we propose such a method for conducting rigorous analysis on large volumes of textual data, namely a variation of LLM-assisted content analysis (LACA). This method combines content analysis with the use of large language models, empowering researchers to conduct larger-scale research which they would otherwise not be able to perform. Using a computing education dataset, we illustrate how LACA could be applied in a reproducible and rigorous manner. We believe this method has potential in CER, enabling more generalisable findings from a wider range of research. This, together with the development of similar methods, can help to advance both the practice and research quality of the CER discipline.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.86)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Virginia (0.04)
- (5 more...)
- Education (0.69)
- Information Technology > Hardware (0.40)
ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
Hoyle, Alexander, Calvo-Bartolomé, Lorena, Boyd-Graber, Jordan, Resnik, Philip
Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations. Package, web interface, and data are at https://github.com/ahoho/proxann
- Asia > Middle East > Jordan (0.05)
- North America > United States > Ohio (0.04)
- North America > United States > Maryland (0.04)
- (25 more...)
- Leisure & Entertainment > Sports > Baseball (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Recalibrating the Compass: Integrating Large Language Models into Classical Research Methods
This paper examines how large language models (LLMs) are transforming core quantitative methods in communication research in particular, and in the social sciences more broadly-namely, content analysis, survey research, and experimental studies. Rather than replacing classical approaches, LLMs introduce new possibilities for coding and interpreting text, simulating dynamic respondents, and generating personalized and interactive stimuli. Drawing on recent interdisciplinary work, the paper highlights both the potential and limitations of LLMs as research tools, including issues of validity, bias, and interpretability. To situate these developments theoretically, the paper revisits Lasswell's foundational framework -- "Who says what, in which channel, to whom, with what effect?" -- and demonstrates how LLMs reconfigure message studies, audience analysis, and effects research by enabling interpretive variation, audience trajectory modeling, and counterfactual experimentation. Revisiting the metaphor of the methodological compass, the paper argues that classical research logics remain essential as the field integrates LLMs and generative AI. By treating LLMs not only as technical instruments but also as epistemic and cultural tools, the paper calls for thoughtful, rigorous, and imaginative use of LLMs in future communication and social science research.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Michigan > Ingham County > Lansing (0.04)
- North America > United States > Michigan > Ingham County > East Lansing (0.04)
- (5 more...)
- Research Report > Experimental Study (0.88)
- Research Report > New Finding (0.88)
- Government (1.00)
- Media > News (0.67)
- Health & Medicine > Therapeutic Area (0.46)
Data to Decisions: A Computational Framework to Identify skill requirements from Advertorial Data
Singh, Aakash, Kanaujia, Anurag, Singh, Vivek Kumar
Among the factors of production, human capital or skilled manpower is the one that keeps evolving and adapts to changing conditions and resources. This adaptability makes human capital the most crucial factor in ensuring a sustainable growth of industry/sector. As new technologies are developed and adopted, the new generations are required to acquire skills in newer technologies in order to be employable. At the same time professionals are required to upskill and reskill themselves to remain relevant in the industry. There is however no straightforward method to identify the skill needs of the industry at a given point of time. Therefore, this paper proposes a data to decision framework that can successfully identify the desired skill set in a given area by analysing the advertorial data collected from popular online job portals and supplied as input to the framework. The proposed framework uses techniques of statistical analysis, data mining and natural language processing for the purpose. The applicability of the framework is demonstrated on CS&IT job advertisement data from India. The analytical results not only provide useful insights about current state of skill needs in CS&IT industry but also provide practical implications to prospective job applicants, training agencies, and institutions of higher education & professional training.
- North America > United States (0.68)
- Asia > India > Karnataka > Bengaluru (0.05)
- Asia > India > Maharashtra > Mumbai (0.04)
- (9 more...)
- Instructional Material (0.88)
- Research Report > New Finding (0.67)
- Banking & Finance (1.00)
- Information Technology > Software (0.94)
- Education > Educational Setting > Higher Education (0.68)
- Government > Regional Government (0.68)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
MediaMind: Revolutionizing Media Monitoring using Agentification
Gunduz, Ahmet, Yuksel, Kamer Ali, Sawaf, Hassan
In an era of rapid technological advancements, agentification of software tools has emerged as a critical innovation, enabling systems to function autonomously and adaptively. This paper introduces MediaMind as a case study to demonstrate the agentification process, highlighting how existing software can be transformed into intelligent agents capable of independent decision-making and dynamic interaction. Developed by aiXplain, MediaMind leverages agent-based architecture to autonomously monitor, analyze, and provide insights from multilingual media content in real time. The focus of this paper is on the technical methodologies and design principles behind agentifying MediaMind, showcasing how agentification enhances adaptability, efficiency, and responsiveness. Through detailed case studies and practical examples, we illustrate how the agentification of MediaMind empowers organizations to streamline workflows, optimize decision-making, and respond to evolving trends. This work underscores the broader potential of agentification to revolutionize software tools across various domains.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia (0.04)
SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention
Zhao, Chengshuai, Tan, Zhen, Wong, Chau-Wai, Zhao, Xinyan, Chen, Tianlong, Liu, Huan
Content analysis breaks down complex and unstructured texts into theory-informed numerical categories. Particularly, in social science, this process usually relies on multiple rounds of manual annotation, domain expert discussion, and rule-based refinement. In this paper, we introduce SCALE, a novel multi-agent framework that effectively $\underline{\textbf{S}}$imulates $\underline{\textbf{C}}$ontent $\underline{\textbf{A}}$nalysis via $\underline{\textbf{L}}$arge language model (LLM) ag$\underline{\textbf{E}}$nts. SCALE imitates key phases of content analysis, including text coding, collaborative discussion, and dynamic codebook evolution, capturing the reflective depth and adaptive discussions of human researchers. Furthermore, by integrating diverse modes of human intervention, SCALE is augmented with expert input to further enhance its performance. Extensive evaluations on real-world datasets demonstrate that SCALE achieves human-approximated performance across various complex content analysis tasks, offering an innovative potential for future social science research.
- North America > United States > North Carolina (0.04)
- North America > United States > Florida > Orange County > Orlando (0.04)
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- (9 more...)
DreamLLM-3D: Affective Dream Reliving using Large Language Model and 3D Generative AI
Liu, Pinyao, Lee, Keon Ju, Steinmaurer, Alexander, Picard-Deland, Claudia, Carr, Michelle, Kitson, Alexandra
We present DreamLLM-3D, a composite multimodal AI system behind an immersive art installation for dream re-experiencing. It enables automated dream content analysis for immersive dream-reliving, by integrating a Large Language Model (LLM) with text-to-3D Generative AI. The LLM processes voiced dream reports to identify key dream entities (characters and objects), social interaction, and dream sentiment. The extracted entities are visualized as dynamic 3D point clouds, with emotional data influencing the color and soundscapes of the virtual dream environment. Additionally, we propose an experiential AI-Dreamworker Hybrid paradigm. Our system and paradigm could potentially facilitate a more emotionally engaging dream-reliving experience, enhancing personal insights and creativity.
- South America > Brazil > Rio de Janeiro (0.14)
- North America > United States > Connecticut > Fairfield County (0.14)
- Europe (0.14)