AITopics

Country:

Europe (1.00)
North America > United States > California (0.45)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (1.00)
(3 more...)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.66)

Neural Information Processing SystemsJun-19-2026, 14:10:31 GMT

User 1000 Model4o 4o MistralMistral LLaMALLaMA QwenQwen Safety: 5/5 ModelSafety: 2/5

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics--such as factuality, bias, or toxicity--overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce "personalized safety" to fill this gap and present PENGUIN--a benchmark comprising 14,000scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to safety enhancement. To address this, we develop RAISE--a training-free, two-stage agent framework that strategically acquires user-specific background. RAISE improves safety scores by up to 31.6%over six vanilla LLMs, while maintaining a low interaction cost of just 2.7 user queries on average. Our findings highlight the importance of selective information gathering in safety-critical domains and offer a practical solution for personalizing LLM responses without model retraining. This work establishes a foundation for safety research that adapts to individual user contexts rather than assuming a universal harm standard.

large language model, machine learning, natural language, (17 more...)

Country:

Europe (1.00)
Asia (0.67)
North America > United States > California (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Consumer Health (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-10-2026, 05:04:14 GMT

9b82909c30456ac902e14526e63081d4-Supplemental.pdf

dataset, disparity, recourse, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Artificial Intelligence > Natural Language (0.32)

arXiv.org Artificial IntelligenceDec-9-2025

First Responders' Perceptions of Semantic Information for Situational Awareness in Robot-Assisted Emergency Response

Ruan, Tianshu, Betta, Zoe, Tzoumas, Georgios, Stolkin, Rustam, Chiou, Manolis

This study investigates First Responders' (FRs) attitudes toward the use of semantic information and Situational Awareness (SA) in robotic systems during emergency operations. A structured questionnaire was administered to 22 FRs across eight countries, capturing their demographic profiles, general attitudes toward robots, and experiences with semantics-enhanced SA. Results show that most FRs expressed positive attitudes toward robots, and rated the usefulness of semantic information for building SA at an average of 3.6 out of 5. Semantic information was also valued for its role in predicting unforeseen emergencies (mean 3.9). Participants reported requiring an average of 74.6\% accuracy to trust semantic outputs and 67.8\% for them to be considered useful, revealing a willingness to use imperfect but informative AI support tools. To the best of our knowledge, this study offers novel insights by being one of the first to directly survey FRs on semantic-based SA in a cross-national context. It reveals the types of semantic information most valued in the field, such as object identity, spatial relationships, and risk context-and connects these preferences to the respondents' roles, experience, and education levels. The findings also expose a critical gap between lab-based robotics capabilities and the realities of field deployment, highlighting the need for more meaningful collaboration between FRs and robotics researchers. These insights contribute to the development of more user-aligned and situationally aware robotic systems for emergency response.

artificial intelligence, information, natural language, (17 more...)

2510.16692

Country: Europe > United Kingdom > England (0.28)

Genre:

Questionnaire & Opinion Survey (0.90)
Research Report > New Finding (0.48)

Industry:

Government > Military (0.72)
Health & Medicine (0.68)
Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Yao, Jianzhou, Liu, Shunchang, Drui, Guillaume, Pettersson, Rikard, Blasimme, Alessandro, Kijewski, Sara

The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical Diagnoses

arXiv.org Artificial IntelligenceNov-4-2025

Large language models (LLMs) show promise for supporting clinicians in diagnostic communication by generating explanations and guidance for patients. Yet their ability to produce outputs that are both understandable and empathetic remains uncertain. We evaluate two leading LLMs on medical diagnostic scenarios, assessing understandability using readability metrics as a proxy and empathy through LLM-as-a-Judge ratings compared to human evaluations. The results indicate that LLMs adapt explanations to socio-demographic variables and patient conditions. However, they also generate overly complex content and display biased affective empathy, leading to uneven accessibility and support. These patterns underscore the need for systematic calibration to ensure equitable patient communication. The code and data are released: https://github.com/Jeffateth/Biased_Oracle

large language model, machine learning, natural language, (20 more...)

2511.00924

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-31-2025

AI Debate Aids Assessment of Controversial Claims

Rahman, Salman, Issaka, Sheriff, Suvarna, Ashima, Liu, Genglin, Shiffer, James, Lee, Jaeyoung, Parvez, Md Rizwan, Palangi, Hamid, Feng, Shi, Peng, Nanyun, Choi, Yejin, Michael, Julian, Jiang, Liwei, Gabriel, Saadia

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.

large language model, machine learning, natural language, (23 more...)

2506.02175

Country:

Asia (0.92)
Europe > United Kingdom (0.67)
North America > United States > California (0.45)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (1.00)
(2 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-14-2025

Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry

Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.

artificial intelligence, flooring material, machine learning, (18 more...)

2510.10952

Country:

North America > United States > Texas (0.28)
North America > United States > Georgia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceSep-11-2025

From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital

Kumar, Mihir, Yin, Aaron Ontoyin, Salifu, Zakari, Amoaba, Kelvin, Samuel, Afriyie Kwesi, Alican, Fuat, Ihlamur, Yigit

This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture. The approach combines the predictive strength of black-box models with the interpretability required for reliable decision-making. We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data, which are then processed within a layered ensemble of models including XGBoost, Random Forest, and Linear Regression. The ensemble first produces a continuous estimate of success likelihood, which is then thresholded to produce a binary rare-event prediction. We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data. The empirical results show strong performance: the model achieves precision between 9.8X and 11.1X the random classifier baseline in three independent test subsets. Feature sensitivity analysis further reveals interpretable success drivers: the startup's category list accounts for 15.6% of predictive influence, followed by the number of founders, while education level and domain expertise contribute smaller yet consistent effects.

large language model, machine learning, natural language, (19 more...)

2509.0814

Genre: Research Report > New Finding (0.35)

Industry: Banking & Finance > Capital Markets (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Neural Information Processing SystemsAug-16-2025, 08:46:51 GMT

A Experiment Details

We train our models with the ADAM optimizer using a learning rate of 0.002. A.2 Data Processing We use the following features for training our models. Our approach allows for customization of actionable features and constraints on their values. We require that education level can only increase. The actionable features are: (i) education level and (ii) the number of prison rule violations reported during the sample sentence.

machine learning, natural language, recourse, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Artificial Intelligence > Natural Language (0.32)

arXiv.org Artificial IntelligenceJun-17-2025

Modeling Earth-Scale Human-Like Societies with One Billion Agents

Guan, Haoxiang, He, Jiyan, Fan, Liyang, Ren, Zhenzhen, He, Shaobin, Yu, Xin, Chen, Yuan, Zheng, Shuxin, Liu, Tie-Yan, Liu, Zhen

Understanding how complex societal behaviors emerge from individual cognition and interactions requires both high-fidelity modeling of human behavior and large-scale simulations. Traditional agent-based models (ABMs) have been employed to study these dynamics for decades, but are constrained by simplified agent behaviors that fail to capture human complexity. Recent advances in large language models (LLMs) offer new opportunities by enabling agents to exhibit sophisticated social behaviors that go beyond rule-based logic, yet face significant scaling challenges. Here we present Light Society, an agent-based simulation framework that advances both fronts, efficiently modeling human-like societies at planetary scale powered by LLMs. Light Society formalizes social processes as structured transitions of agent and environment states, governed by a set of LLM-powered simulation operations, and executed through an event queue. This modular design supports both independent and joint component optimization, supporting efficient simulation of societies with over one billion agents. Large-scale simulations of trust games and opinion propagation--spanning up to one billion agents--demonstrate Light Society's high fidelity and efficiency in modeling social trust and information diffusion, while revealing scaling laws whereby larger simulations yield more stable and realistic emergent behaviors.

large language model, natural language, simulation, (17 more...)