AITopics

Fine-tuning Large Language Models (LLMs) with reinforcement learning to produce an explicit Chain-of-Thought (CoT) before answering produces models that consistently raise overall performance on code, math, and general-knowledge benchmarks. However, on benchmarks where LLMs currently achieve low accuracy, such as Humanity's Last Exam (HLE), they often report high self-confidence, reflecting poor calibration. Here, we test whether measurable properties of the CoT provide reliable signals of an LLM's internal confidence in its answers. We analyze three feature classes: (i) CoT length, (ii) intra-CoT sentiment volatility, and (iii) lexicographic hints, including hedging words. Using DeepSeek-R1 and Claude 3.7 Sonnet on both Humanity's Last Exam (HLE), a frontier benchmark with very low accuracy, and Omni-MATH, a saturated benchmark of moderate difficulty, we find that lexical markers of uncertainty (e.g., $\textit{guess}$, $\textit{stuck}$, $\textit{hard}$) in the CoT are the strongest indicators of an incorrect response, while shifts in the CoT sentiment provide a weaker but complementary signal. CoT length is informative only on Omni-MATH, where accuracy is already high ($\approx 70\%$), and carries no signal on the harder HLE ($\approx 9\%$), indicating that CoT length predicts correctness only in the intermediate-difficulty benchmarks, i.e., inside the model's demonstrated capability, but still below saturation. Finally, we find that uncertainty indicators in the CoT are consistently more salient than high-confidence markers, making errors easier to predict than correct responses. Our findings support a lightweight post-hoc calibration signal that complements unreliable self-reported probabilities and supports safer deployment of LLMs.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

2508.15842

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Mohammed, Idrees, Hassani, Hossein

Mining Mental Health Signals: A Comparative Study of Four Machine Learning Methods for Depression Detection from Social Media Posts in Sorani Kurdish

Depression is a common mental health condition that can lead to hopelessness, loss of interest, self-harm, and even suicide. Early detection is challenging due to individuals not self-reporting or seeking timely clinical help. With the rise of social media, users increasingly express emotions online, offering new opportunities for detection through text analysis. While prior research has focused on languages such as English, no studies exist for Sorani Kurdish. This work presents a machine learning and Natural Language Processing (NLP) approach to detect depression in Sorani tweets. A set of depression-related keywords was developed with expert input to collect 960 public tweets from X (Twitter platform). The dataset was annotated into three classes: Shows depression, Not-show depression, and Suspicious by academics and final year medical students at the University of Kurdistan Hewlêr. Four supervised models, including Support Vector Machines, Multinomial Naive Bayes, Logistic Regression, and Random Forest, were trained and evaluated, with Random Forest achieving the highest performance accuracy and F1-score of 80%. This study establishes a baseline for automated depression detection in Kurdish language contexts.

artificial intelligence, deep learning, machine learning, (14 more...)

2508.15829

Country: Asia > Middle East > Iraq > Kurdistan Region (0.35)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

When Algorithms Play Favorites: Lookism in the Generation and Perception of Faces

Doh, Miriam, Gulati, Aditya, Mancas, Matei, Oliver, Nuria

This paper examines how synthetically generated faces and machine learning-based gender classification algorithms are affected by algorithmic lookism, the preferential treatment based on appearance. In experiments with 13,200 synthetically generated faces, we find that: (1) text-to-image (T2I) systems tend to associate facial attractiveness to unrelated positive traits like intelligence and trustworthiness; and (2) gender classification models exhibit higher error rates on "less-attractive" faces, especially among non-White women. These result raise fairness concerns regarding digital identity systems.

artificial intelligence, machine learning, proceedings, (15 more...)

2506.11025

Country:

Europe > Spain (0.16)
Europe > Belgium (0.15)

Genre: Research Report (1.00)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Perceptual Implications of Automatic Anonymization in Pathological Speech

Arasteh, Soroosh Tayebi, Afza, Saba, Nguyen, Tri-Thien, Buess, Lukas, Parvin, Maryam, Arias-Vergara, Tomas, Perez-Toro, Paula Andrea, Hung, Hiu Ching, Lotfinia, Mahshad, Gorges, Thomas, Noeth, Elmar, Schuster, Maria, Yang, Seung Hee, Maier, Andreas

Automatic anonymization techniques are essential for ethical sharing of pathological speech data, yet their perceptual consequences remain understudied. We present a comprehensive human-centered analysis of anonymized pathological speech, using a structured protocol involving ten native and non-native German listeners with diverse linguistic, clinical, and technical backgrounds. Listeners evaluated anonymized-original utterance pairs from 180 speakers spanning Cleft Lip and Palate, Dysarthria, Dysglossia, Dysphonia, and healthy controls. Speech was anonymized using state-of-the-art automatic methods (equal error rates in the range of 30-40%). Listeners completed Turing-style discrimination and quality rating tasks under zero-shot (single-exposure) and few-shot (repeated-exposure) conditions. Discrimination accuracy was high overall (91% zero-shot; 93% few-shot), but varied by disorder (repeated-measures ANOVA: p=0.007), ranging from 96% (Dysarthria) to 86% (Dysphonia). Anonymization consistently reduced perceived quality across groups (from 83% to 59%, p<0.001), with pathology-specific degradation patterns (one-way ANOVA: p=0.005). Native listeners showed a non-significant trend toward higher original speech ratings (Delta=4%, p=0.199), but this difference was minimal after anonymization (Delta=1%, p=0.724). No significant gender-based bias was observed. Perceptual outcomes did not correlate with automatic metrics; intelligibility was linked to perceived quality in original speech but not after anonymization. These findings underscore the need for listener-informed, disorder-specific anonymization strategies that preserve both privacy and perceptual integrity.

artificial intelligence, machine learning, natural language, (21 more...)

2505.00409

Country:

North America > United States (1.00)
Europe > Germany (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)

The GuardianAug-23-2025, 06:00:57 GMT

Expert rejects Met police claim that study backs bias-free live facial recognition use

The Metropolitan police's claims that their use of live facial recognition is bias-free are not substantiated by the report they cite to support their case, a leading expert on the technology has said. The Met is planning its biggest and most high profile use of LFR yet this bank holiday weekend at Notting Hill carnival in west London. The Guardian understands it will be deployed at two sites on the approaches to the carnival, with the force insisting on its use despite the Equality and Human Rights Commission saying police use of LFR is unlawful. The new claims come from Prof Pete Fussey, who led the only independent academic review of police use of facial recognition, is a former reviewer of LFR for the Met from 2018-19, and currently advises other forces in the UK and abroad on its use. The Met says it has reformed its use of LFR after a 2023 study it commissioned from the National Physical Laboratory (NPL) and it is now, in effect, bias-free. But Fussey said: "The claims the Met are making about the absence of bias from the NPL report are not substantiated by the facts in that report."

artificial intelligence, fussey, machine learning, (7 more...)

The Guardian

Country: Europe > United Kingdom > England > Greater London > London (0.25)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (0.78)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Neural Information Processing SystemsAug-22-2025, 02:22:30 GMT

e0b0f9051084fd476926501af19e1e96-Paper.pdf

algorithm, sample complexity, sequence, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.47)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)