Goto

Collaborating Authors

 education level



First Responders' Perceptions of Semantic Information for Situational Awareness in Robot-Assisted Emergency Response

Ruan, Tianshu, Betta, Zoe, Tzoumas, Georgios, Stolkin, Rustam, Chiou, Manolis

arXiv.org Artificial Intelligence

This study investigates First Responders' (FRs) attitudes toward the use of semantic information and Situational Awareness (SA) in robotic systems during emergency operations. A structured questionnaire was administered to 22 FRs across eight countries, capturing their demographic profiles, general attitudes toward robots, and experiences with semantics-enhanced SA. Results show that most FRs expressed positive attitudes toward robots, and rated the usefulness of semantic information for building SA at an average of 3.6 out of 5. Semantic information was also valued for its role in predicting unforeseen emergencies (mean 3.9). Participants reported requiring an average of 74.6\% accuracy to trust semantic outputs and 67.8\% for them to be considered useful, revealing a willingness to use imperfect but informative AI support tools. To the best of our knowledge, this study offers novel insights by being one of the first to directly survey FRs on semantic-based SA in a cross-national context. It reveals the types of semantic information most valued in the field, such as object identity, spatial relationships, and risk context-and connects these preferences to the respondents' roles, experience, and education levels. The findings also expose a critical gap between lab-based robotics capabilities and the realities of field deployment, highlighting the need for more meaningful collaboration between FRs and robotics researchers. These insights contribute to the development of more user-aligned and situationally aware robotic systems for emergency response.


The Biased Oracle: Assessing LLMs' Understandability and Empathy in Medical Diagnoses

Yao, Jianzhou, Liu, Shunchang, Drui, Guillaume, Pettersson, Rikard, Blasimme, Alessandro, Kijewski, Sara

arXiv.org Artificial Intelligence

Large language models (LLMs) show promise for supporting clinicians in diagnostic communication by generating explanations and guidance for patients. Yet their ability to produce outputs that are both understandable and empathetic remains uncertain. We evaluate two leading LLMs on medical diagnostic scenarios, assessing understandability using readability metrics as a proxy and empathy through LLM-as-a-Judge ratings compared to human evaluations. The results indicate that LLMs adapt explanations to socio-demographic variables and patient conditions. However, they also generate overly complex content and display biased affective empathy, leading to uneven accessibility and support. These patterns underscore the need for systematic calibration to ensure equitable patient communication. The code and data are released: https://github.com/Jeffateth/Biased_Oracle


AI Debate Aids Assessment of Controversial Claims

Rahman, Salman, Issaka, Sheriff, Suvarna, Ashima, Liu, Genglin, Shiffer, James, Lee, Jaeyoung, Parvez, Md Rizwan, Palangi, Hamid, Feng, Shi, Peng, Nanyun, Choi, Yejin, Michael, Julian, Jiang, Liwei, Gabriel, Saadia

arXiv.org Artificial Intelligence

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.


Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry

arXiv.org Artificial Intelligence

Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.


From Limited Data to Rare-event Prediction: LLM-powered Feature Engineering and Multi-model Learning in Venture Capital

Kumar, Mihir, Yin, Aaron Ontoyin, Salifu, Zakari, Amoaba, Kelvin, Samuel, Afriyie Kwesi, Alican, Fuat, Ihlamur, Yigit

arXiv.org Artificial Intelligence

This paper presents a framework for predicting rare, high-impact outcomes by integrating large language models (LLMs) with a multi-model machine learning (ML) architecture. The approach combines the predictive strength of black-box models with the interpretability required for reliable decision-making. We use LLM-powered feature engineering to extract and synthesize complex signals from unstructured data, which are then processed within a layered ensemble of models including XGBoost, Random Forest, and Linear Regression. The ensemble first produces a continuous estimate of success likelihood, which is then thresholded to produce a binary rare-event prediction. We apply this framework to the domain of Venture Capital (VC), where investors must evaluate startups with limited and noisy early-stage data. The empirical results show strong performance: the model achieves precision between 9.8X and 11.1X the random classifier baseline in three independent test subsets. Feature sensitivity analysis further reveals interpretable success drivers: the startup's category list accounts for 15.6% of predictive influence, followed by the number of founders, while education level and domain expertise contribute smaller yet consistent effects.


A Experiment Details

Neural Information Processing Systems

We train our models with the ADAM optimizer using a learning rate of 0.002. A.2 Data Processing We use the following features for training our models. Our approach allows for customization of actionable features and constraints on their values. We require that education level can only increase. The actionable features are: (i) education level and (ii) the number of prison rule violations reported during the sample sentence.


Modeling Earth-Scale Human-Like Societies with One Billion Agents

Guan, Haoxiang, He, Jiyan, Fan, Liyang, Ren, Zhenzhen, He, Shaobin, Yu, Xin, Chen, Yuan, Zheng, Shuxin, Liu, Tie-Yan, Liu, Zhen

arXiv.org Artificial Intelligence

Understanding how complex societal behaviors emerge from individual cognition and interactions requires both high-fidelity modeling of human behavior and large-scale simulations. Traditional agent-based models (ABMs) have been employed to study these dynamics for decades, but are constrained by simplified agent behaviors that fail to capture human complexity. Recent advances in large language models (LLMs) offer new opportunities by enabling agents to exhibit sophisticated social behaviors that go beyond rule-based logic, yet face significant scaling challenges. Here we present Light Society, an agent-based simulation framework that advances both fronts, efficiently modeling human-like societies at planetary scale powered by LLMs. Light Society formalizes social processes as structured transitions of agent and environment states, governed by a set of LLM-powered simulation operations, and executed through an event queue. This modular design supports both independent and joint component optimization, supporting efficient simulation of societies with over one billion agents. Large-scale simulations of trust games and opinion propagation--spanning up to one billion agents--demonstrate Light Society's high fidelity and efficiency in modeling social trust and information diffusion, while revealing scaling laws whereby larger simulations yield more stable and realistic emergent behaviors.


VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models

Huang, Jen-tse, Qin, Jiantong, Zhang, Jianping, Yuan, Youliang, Wang, Wenxuan, Zhao, Jieyu

arXiv.org Artificial Intelligence

This research investigates both explicit and implicit social biases exhibited by Vision-Language Models (VLMs). The key distinction between these bias types lies in the level of awareness: explicit bias refers to conscious, intentional biases, while implicit bias operates subconsciously. To analyze explicit bias, we directly pose questions to VLMs related to gender and racial differences: (1) Multiple-choice questions based on a given image (e.g., "What is the education level of the person in the image?") (2) Yes-No comparisons using two images (e.g., "Is the person in the first image more educated than the person in the second image?") For implicit bias, we design tasks where VLMs assist users but reveal biases through their responses: (1) Image description tasks: Models are asked to describe individuals in images, and we analyze disparities in textual cues across demographic groups. (2) Form completion tasks: Models draft a personal information collection form with 20 attributes, and we examine correlations among selected attributes for potential biases. We evaluate Gemini-1.5, GPT-4V, GPT-4o, LLaMA-3.2-Vision and LLaVA-v1.6. Our code and data are publicly available at https://github.com/uscnlp-lime/VisBias.


Towards Equitable AI: Detecting Bias in Using Large Language Models for Marketing

Yilmaz, Berk, Ashqar, Huthaifa I.

arXiv.org Artificial Intelligence

The recent advances in large language models (LLMs) have revolutionized industries such as finance, marketing, and customer service by enabling sophisticated natural language processing tasks. However, the broad adoption of LLMs brings significant challenges, particularly in the form of social biases that can be embedded within their outputs. Biases related to gender, age, and other sensitive attributes can lead to unfair treatment, raising ethical concerns and risking both company reputation and customer trust. This study examined bias in finance-related marketing slogans generated by LLMs (i.e., ChatGPT) by prompting tailored ads targeting five demographic categories: gender, marital status, age, income level, and education level. A total of 1,700 slogans were generated for 17 unique demographic groups, and key terms were categorized into four thematic groups: empowerment, financial, benefits and features, and personalization. Bias was systematically assessed using relative bias calculations and statistically tested with the Kolmogorov-Smirnov (KS) test against general slogans generated for any individual. Results revealed that marketing slogans are not neutral; rather, they emphasize different themes based on demographic factors. Women, younger individuals, low-income earners, and those with lower education levels receive more distinct messaging compared to older, higher-income, and highly educated individuals. This underscores the need to consider demographic-based biases in AI-generated marketing strategies and their broader societal implications. The findings of this study provide a roadmap for developing more equitable AI systems, highlighting the need for ongoing bias detection and mitigation efforts in LLMs.