Goto

Collaborating Authors

 rq2


Computational Measurement of Political Positions: A Review of Text-Based Ideal Point Estimation Algorithms

Parschan, Patrick, Jakob, Charlott

arXiv.org Artificial Intelligence

This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has closely followed broader NLP trends -- beginning with word-frequency models and most recently turning to large language models (LLMs). While this trajectory has greatly expanded the methodological toolkit, it has also produced a fragmented field that lacks systematic comparison and clear guidance for applied use. To address this gap, we identified 25 CT-IPE algorithms through a systematic literature review and conducted a manual content analysis of their modeling assumptions and development contexts. To compare them meaningfully, we introduce a conceptual framework that distinguishes how algorithms generate, capture, and aggregate textual variance. On this basis, we identify four methodological families -- word-frequency, topic modeling, word embedding, and LLM-based approaches -- and critically assess their assumptions, interpretability, scalability, and limitations. Our review offers three contributions. First, it provides a structured synthesis of two decades of algorithm development, clarifying how diverse methods relate to one another. Second, it translates these insights into practical guidance for applied researchers, highlighting trade-offs in transparency, technical requirements, and validation strategies that shape algorithm choice. Third, it emphasizes that differences in estimation outcomes across algorithms are themselves informative, underscoring the need for systematic benchmarking.


Factors That Support Grounded Responses in LLM Conversations: A Rapid Review

Iwashima, Gabriele Cesar, Rodrigues, Claudia Susie, Dipolitto, Claudio, Xexéo, Geraldo

arXiv.org Artificial Intelligence

Large language models (LLMs) may generate outputs that are misaligned with user intent, lack contextual grounding, or exhibit hallucinations during conversation, which compromises the reliability of LLM-based applications. This review aimed to identify and analyze techniques that align LLM responses with conversational goals, ensure grounding, and reduce hallucination and topic drift. We conducted a Rapid Review guided by the PRISMA framework and the PICO strategy to structure the search, filtering, and selection processes. The alignment strategies identified were categorized according to the LLM lifecycle phase in which they operate: inference-time, post-training, and reinforcement learning-based methods. Among these, inference-time approaches emerged as particularly efficient, aligning outputs without retraining while supporting user intent, contextual grounding, and hallucination mitigation. The reviewed techniques provided structured mechanisms for improving the quality and reliability of LLM responses across key alignment objectives.


The Anatomy of Speech Persuasion: Linguistic Shifts in LLM-Modified Speeches

Barkar, Alisa, Chollet, Mathieu, Labeau, Matthieu, Biancardi, Beatrice, Clavel, Chloe

arXiv.org Artificial Intelligence

This study examines how large language models understand the concept of persuasiveness in public speaking by modifying speech transcripts from PhD candidates in the "Ma These en 180 Secondes" competition, using the 3MT French dataset. Our contributions include a novel methodology and an interpretable textual feature set integrating rhetorical devices and discourse markers. We prompt GPT-4o to enhance or diminish persuasiveness and analyze linguistic shifts between original and generated speech in terms of the new features. Results indicate that GPT-4o applies systematic stylistic modifications rather than optimizing persuasiveness in a human-like manner. Notably, it manipulates emotional lexicon and syntactic structures (such as interrogative and exclamatory clauses) to amplify rhetorical impact.


Generative Tools for Graphical Assets: Empirical Guidelines based on Game Designers' and Developers' Preferences

Fukaya, Kaisei, Daylamani-Zad, Damon, Agius, Harry

arXiv.org Artificial Intelligence

Graphical assets play an important role in the design and development of games. There is potential in the use of generative tools, to aid in creating graphical assets, thus improving game design and development pipelines. However, there is little research to address how the generative methods can fit into the wider pipeline. We conducted a user study with 16 game designers and developers to examine their preferences regarding generative tools for graphical assets. The findings highlight that early design stage is preferred by all participants (mean values above 0.67 and p < .001 for early stages). Designers and developers prefer to use such tools for creating large amounts of variations at the cost of quality as they can improve the quality of the artefacts once they generate a suitable asset (mean value 0.17 where 1 is high quality, p < .001). They also strongly (mean value .78, p < .001) raised the need for better integration of such tools in existing design and development environments and the need for the outputs to be in common data formats, to be manipulatable and integrate smoothly into existing environments (mean 3.5 out of 5, p = .004). The study also highlights the requirement for further emphasis on the needs of the users to incorporate these tools effectively in existing pipelines. Informed by these results, we provide a set of guidelines for creating tools that meet the expectations and needs of game designers and developers.


Addressing Quality Challenges in Deep Learning: The Role of MLOps and Domain Knowledge

del Rey, Santiago, Medina, Adrià, Franch, Xavier, Martínez-Fernández, Silverio

arXiv.org Artificial Intelligence

Deep learning (DL) systems present unique challenges in software engineering, especially concerning quality attributes like correctness and resource efficiency. While DL models achieve exceptional performance in specific tasks, engineering DL-based systems is still essential. The effort, cost, and potential diminishing returns of continual improvements must be carefully evaluated, as software engineers often face the critical decision of when to stop refining a system relative to its quality attributes. This experience paper explores the role of MLOps practices -- such as monitoring and experiment tracking -- in creating transparent and reproducible experimentation environments that enable teams to assess and justify the impact of design decisions on quality attributes. Furthermore, we report on experiences addressing the quality challenges by embedding domain knowledge into the design of a DL model and its integration within a larger system. The findings offer actionable insights into not only the benefits of domain knowledge and MLOps but also the strategic consideration of when to limit further optimizations in DL projects to maximize overall system quality and reliability.


Can Language Models Recognize Convincing Arguments?

Rescala, Paula, Ribeiro, Manoel Horta, Hu, Tiancheng, West, Robert

arXiv.org Artificial Intelligence

The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda. To gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans, we propose studying their performance on the related task of detecting convincing arguments. We extend a dataset by Durmus & Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs' ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, even surpassing human performance. The data and code released with this paper contribute to the crucial ongoing effort of continuously evaluating and monitoring the rapidly evolving capabilities and potential impact of LLMs.


Fine-grained Affective Processing Capabilities Emerging from Large Language Models

Broekens, Joost, Hilpert, Bernhard, Verberne, Suzan, Baraka, Kim, Gebhard, Patrick, Plaat, Aske

arXiv.org Artificial Intelligence

Large language models, in particular generative pre-trained transformers (GPTs), show impressive results on a wide variety of language-related tasks. In this paper, we explore ChatGPT's zero-shot ability to perform affective computing tasks using prompting alone. We show that ChatGPT a) performs meaningful sentiment analysis in the Valence, Arousal and Dominance dimensions, b) has meaningful emotion representations in terms of emotion categories and these affective dimensions, and c) can perform basic appraisal-based emotion elicitation of situations based on a prompt-based computational implementation of the OCC appraisal model. These findings are highly relevant: First, they show that the ability to solve complex affect processing tasks emerges from language-based token prediction trained on extensive data sets. Second, they show the potential of large language models for simulating, processing and analyzing human emotions, which has important implications for various applications such as sentiment analysis, socially interactive agents, and social robotics.


Fairness-aware machine learning: a perspective

Zliobaite, Indre

arXiv.org Machine Learning

Algorithms learned from data are increasingly used for deciding many aspects in our life: from movies we see, to prices we pay, or medicine we get. Yet there is growing evidence that decision making by inappropriately trained algorithms may unintentionally discriminate people. For example, in automated matching of candidate CVs with job descriptions, algorithms may capture and propagate ethnicity related biases. Several repairs for selected algorithms have already been proposed, but the underlying mechanisms how such discrimination happens from the computational perspective are not yet scientifically understood. We need to develop theoretical understanding how algorithms may become discriminatory, and establish fundamental machine learning principles for prevention. We need to analyze machine learning process as a whole to systematically explain the roots of discrimination occurrence, which will allow to devise global machine learning optimization criteria for guaranteed prevention, as opposed to pushing empirical constraints into existing algorithms case-by-case. As a result, the state-of-the-art will advance from heuristic repairing, to proactive and theoretically supported prevention. This is needed not only because law requires to protect vulnerable people. Penetration of big data initiatives will only increase, and computer science needs to provide solid explanations and accountability to the public, before public concerns lead to unnecessarily restrictive regulations against machine learning.