Goto

Collaborating Authors

 happiness


Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian

Mehrazar, Mobina, Yousefi, Mohammad Amin, Beygi, Parisa Abolfath, Bahrak, Behnam

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly used to generate self-explanations alongside their predictions, a practice that raises concerns about the faithfulness of these explanations, especially in low-resource languages. This study evaluates the faithfulness of LLM-generated explanations in the context of emotion classification in Persian, a low-resource language, by comparing the influential words identified by the model against those identified by human annotators. We assess faithfulness using confidence scores derived from token-level log-probabilities. Two prompting strategies, differing in the order of explanation and prediction (Predict-then-Explain and Explain-then-Predict), are tested for their impact on explanation faithfulness. Our results reveal that while LLMs achieve strong classification performance, their generated explanations often diverge from faithful reasoning, showing greater agreement with each other than with human judgments. These results highlight the limitations of current explanation methods and metrics, emphasizing the need for more robust approaches to ensure LLM reliability in multilingual and low-resource contexts.


Two Americas of Well-Being: Divergent Rural-Urban Patterns of Life Satisfaction and Happiness from 2.6 B Social Media Posts

Iacus, Stefano Maria, Porro, Giuseppe

arXiv.org Artificial Intelligence

Using 2.6 billion geolocated social-media posts (2014-2022) and a fine-tuned generative language model, we construct county-level indicators of life satisfaction and happiness for the United States. We document an apparent rural-urban paradox: rural counties express higher life satisfaction while urban counties exhibit greater happiness. We reconcile this by treating the two as distinct layers of subjective well-being, evaluative vs. hedonic, showing that each maps differently onto place, politics, and time. Republican-leaning areas appear more satisfied in evaluative terms, but partisan gaps in happiness largely flatten outside major metros, indicating context-dependent political effects. Temporal shocks dominate the hedonic layer: happiness falls sharply during 2020-2022, whereas life satisfaction moves more modestly. These patterns are robust across logistic and OLS specifications and align with well-being theory. Interpreted as associations for the population of social-media posts, the results show that large-scale, language-based indicators can resolve conflicting findings about the rural-urban divide by distinguishing the type of well-being expressed, offering a transparent, reproducible complement to traditional surveys.


The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013--2023

Iacus, Stefano M., Jain, Devika, Nasuto, Andrea, Porro, Giuseppe, Carammia, Marcello, Vezzulli, Andrea

arXiv.org Artificial Intelligence

Quantifying human flourishing, a multidimensional construct including happiness, health, purpose, virtue, relationships, and financial stability, is critical for understanding societal well-being beyond economic indicators. Existing measures often lack fine spatial and temporal resolution. Here we introduce the Human Flourishing Geographic Index (HFGI), derived from analyzing approximately 2.6 billion geolocated U.S. tweets (2013-2023) using fine-tuned large language models to classify expressions across 48 indicators aligned with Harvard's Global Flourishing Study framework plus attitudes towards migration and perception of corruption. The dataset offers monthly and yearly county- and state-level indicators of flourishing-related discourse, validated to confirm that the measures accurately represent the underlying constructs and show expected correlations with established indicators. This resource enables multidisciplinary analyses of well-being, inequality, and social change at unprecedented resolution, offering insights into the dynamics of human flourishing as reflected in social media discourse across the United States over the past decade.


Happiness as a Measure of Fairness

Pichler, Georg, Romanelli, Marco, Piantanida, Pablo

arXiv.org Machine Learning

In this paper, we propose a novel fairness framework grounded in the concept of happiness, a measure of the utility each group gains from decision outcomes. By capturing fairness through this intuitive lens, we not only offer a more human-centered approach, but also one that is mathematically rigorous: In order to compute the optimal, fair post-processing strategy, only a linear program needs to be solved. This makes our method both efficient and scalable with existing optimization tools. Furthermore, it unifies and extends several well-known fairness definitions, and our empirical results highlight its practical strengths across diverse scenarios.


Switchboard-Affect: Emotion Perception Labels from Conversational Speech

Romana, Amrit, Narain, Jaya, Tran, Tien Dung, Davis, Andrea, Fong, Jason, Rasipuram, Ramya, Mitra, Vikramjit

arXiv.org Artificial Intelligence

Abstract--Understanding the nuances of speech emotion dataset curation and labeling is essential for assessing speech emotion recognition (SER) model potential in real-world applications. Most training and evaluation datasets contain acted or pseudo-acted speech (e.g., podcast speech) in which emotion expressions may be exaggerated or otherwise intentionally modified. Furthermore, datasets labeled based on crowd perception often lack transparency regarding the guidelines given to annotators. These factors make it difficult to understand model performance and pinpoint necessary areas for improvement. T o address this gap, we identified the Switchboard corpus as a promising source of naturalistic conversational speech, and we trained a crowd to label the dataset for categorical emotions (anger, contempt, disgust, fear, sadness, surprise, happiness, tenderness, calmness, and neutral) and dimensional attributes (activation, valence, and dominance). We refer to this label set as Switchboard-Affect (SWB-Affect). In this work, we present our approach in detail, including the definitions provided to annotators and an analysis of the lexical and paralinguistic cues that may have played a role in their perception. In addition, we evaluate state-of-the-art SER models, and we find variable performance across the emotion categories with especially poor generalization for anger . These findings underscore the importance of evaluation with datasets that capture natural affective variations in speech. We release the labels for SWB-Affect to enable further analysis in this domain. Speech emotion recognition (SER) has the potential to enhance human-computer interaction, improve our ability to monitor mental health and well-being [1], [2], and better understand customer service, entertainment, and education experiences [3], [4].


Dating apps, booze and clubbing - Jane Austen's Emma comes into the 21st Century

BBC News

Dating apps, booze and clubbing - Jane Austen's Emma comes into the 21st Century And your pushy best friend is trying to sort out your love life. It's Jane Austen's Emma, but not as you know it. For the uninitiated, the 1815 novel follows the charmed life of our protagonist in Regency England as she busies herself interfering in her friends' relationships (or matchmaking, depending on your point of view). In Ava Pickett's fresh adaptation, being staged at London's Rose Theatre, Emma Woodhouse still has all the trademark traits of our beloved original heroine - she's clever, quick-witted, meddling, haughty and occasionally cruel. But instead of navigating society balls and dowries, Pickett's modern Emma is poking her nose into her friends' online dating profiles, having returned home after failing her exams at Oxford University.


A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts

Tohidi, Kian, Dashtipour, Kia, Rebora, Simone, Pourfaramarz, Sevda

arXiv.org Artificial Intelligence

This study presents a comprehensive comparative evaluation of four state-of-the-art Large Language Models (LLMs)--Claude 3.7 Sonnet, DeepSeek-V3, Gemini 2.0 Flash, and GPT-4o--for sentiment analysis and emotion detection in Persian social media texts. Comparative analysis among LLMs has witnessed a significant rise in recent years, however, most of these analyses have been conducted on English language tasks, creating gaps in understanding cross-linguistic performance patterns. This research addresses these gaps through rigorous experimental design using balanced Persian datasets containing 900 texts for sentiment analysis (positive, negative, neutral) and 1,800 texts for emotion detection (anger, fear, happiness, hate, sadness, surprise). The main focus was to allow for a direct and fair comparison among different models, by using consistent prompts, uniform processing parameters, and by analyzing the performance metrics such as precision, recall, F1-scores, along with misclassification patterns. The results show that all models reach an acceptable level of performance, and a statistical comparison of the best three models indicates no significant differences among them. However, GPT-4o demonstrated a marginally higher raw accuracy value for both tasks, while Gemini 2.0 Flash proved to be the most cost-efficient. The findings indicate that the emotion detection task is more challenging for all models compared to the sentiment analysis task, and the misclassification patterns can represent some challenges in Persian language texts. These findings establish performance benchmarks for Persian NLP applications and offer practical guidance for model selection based on accuracy, efficiency, and cost considerations, while revealing cultural and linguistic challenges that require consideration in multilingual AI system deployment.


Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets

Soumik, Mohd. Farhan Israk, Hasan, Syed Mhamudul, Shahid, Abdur R.

arXiv.org Artificial Intelligence

The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By developing early novel datasets specifically for this purpose, we empirically assess how different text modifications influence LLM-based detection. This capability suggests strong potential for Apple Intelligence's writing tools as privacy-preserving mechanisms. Our findings lay the groundwork for future adaptive rewriting systems capable of dynamically neutralizing sensitive emotional content to enhance user privacy. To the best of our knowledge, this research provides the first empirical analysis of Apple Intelligence's text-modification tools within a privacy-preservation context with the broader goal of developing on-device, user-centric privacy-preserving mechanisms to protect against LLMs-based advanced inference attacks on deployed systems.


Emotion Recognition for Low-Resource Turkish: Fine-Tuning BERTurk on TREMO and Testing on Xenophobic Political Discourse

Wicaksono, Darmawan, Rozaq, Hasri Akbar Awal, Boz, Nevfel

arXiv.org Artificial Intelligence

Social media platforms like X (formerly Twitter) play a crucial role in shaping public discourse and societal norms. This study examines the term Sessiz Istila (Silent Invasion) on Turkish social media, highlighting the rise of anti-refugee sentiment amidst the Syrian refugee influx. Using BERTurk and the TREMO dataset, we developed an advanced Emotion Recognition Model (ERM) tailored for Turkish, achieving 92.62% accuracy in categorizing emotions such as happiness, fear, anger, sadness, disgust, and surprise. By applying this model to large-scale X data, the study uncovers emotional nuances in Turkish discourse, contributing to computational social science by advancing sentiment analysis in underrepresented languages and enhancing our understanding of global digital discourse and the unique linguistic challenges of Turkish. The findings underscore the transformative potential of localized NLP tools, with our ERM model offering practical applications for real-time sentiment analysis in Turkish-language contexts. By addressing critical areas, including marketing, public relations, and crisis management, these models facilitate improved decision-making through timely and accurate sentiment tracking. This highlights the significance of advancing research that accounts for regional and linguistic nuances.


This sticker reads emotions (even the ones you try to hide)

Popular Science

Good luck hiding how you feel. Researchers from Penn State University believe they have developed a stretchy, Band-Aid-sized wearable device capable of decoding even the most advanced poker face. The device attaches to a subject's skin and uses sensors to independently detect physiological responses, such as skin temperature and perspiration, in real time. That data is then digitized and analyzed by an AI model designed to determine the type of emotional responses the wearer is experiencing. In testing, the device was able to accurately identify the correct emotional response 89 percent of the time--significantly more accurate, the researchers say, than simply observing a person's facial expression.