Goto

Collaborating Authors

 hnullp


"Draw me a curator" Examining the visual stereotyping of a cultural services profession by generative AI

arXiv.org Artificial Intelligence

Based on 230 visualisations, this paper examines the depiction of museum curators by the popular generative Artificial Intelligence (AI) model, ChatGPT4o. While the AI-generated representations do not reiterate popular stereotypes of curators as nerdy, conservative in dress and stuck in time rummaging through collections, they contrast sharply with real-world demographics. AI-generated imagery extremely underrepresents women (3.5% vs 49% to 72% in reality) and disregards ethnic communities other than Caucasian (0% vs 18% to 36%). It only over-represents young curators (79% vs approx. 27%) but also renders curators to resemble yuppie professionals or people featuring in fashion advertising. Stereotypical attributes are prevalent, with curators widely depicted as wearing beards and holding clipboards or digital tablets. The findings highlight biases in the generative AI image creation dataset, which is poised to shape an inaccurate portrayal of museum professionals if the images were to be taken uncritically at face value.


Prompt fidelity of ChatGPT4o / Dall-E3 text-to-image visualisations

arXiv.org Artificial Intelligence

This study examines the prompt fidelity of ChatGPT4o / DALL - E3 text - to - image visualisations by analysing whether anullributes explicitly specified in autogenously generated prompts are correctly rendered in the resulting images. Using two public - domain datasets comprising 200 visualisations of women working in the cultural and creative industries and 230 visualisations of museum curators, the study assessed accuracy across personal anullributes (age, hair), appearance (anullire, glasses), and paraphernalia (name tags, clipboards). While correctly rendered in most cases, DALL - E3 deviated from prompt specifications in 15.6% of all anullributes (n=710). Errors were lowest for paraphernalia, moderate for personal appearance, and highest for depictions of the person themselves, particularly age. These findings demonstrate measurable prompt - to - image fidelity gaps with implications for bias detection and model evaluation.


Using the Pepper Robot to Support Sign Language Communication

arXiv.org Artificial Intelligence

Social robots are increasingly experimented in public and assistive settings, but their accessibility for Deaf users remains quite underexplored. Italian Sign Language (LIS) is a fully-fledged natural language that relies on complex manual and non-manual components. Enabling robots to communicate using LIS could foster more inclusive human robot interaction, especially in social environments such as hospitals, airports, or educational settings. This study investigates whether a commercial social robot, Pepper, can produce intelligible LIS signs and short signed LIS sentences. With the help of a Deaf student and his interpreter, an expert in LIS, we co-designed and implemented 52 LIS signs on Pepper using either manual animation techniques or a MATLAB based inverse kinematics solver. We conducted a exploratory user study involving 12 participants proficient in LIS, both Deaf and hearing. Participants completed a questionnaire featuring 15 single-choice video-based sign recognition tasks and 2 open-ended questions on short signed sentences. Results shows that the majority of isolated signs were recognized correctly, although full sentence recognition was significantly lower due to Pepper's limited articulation and temporal constraints. Our findings demonstrate that even commercially available social robots like Pepper can perform a subset of LIS signs intelligibly, offering some opportunities for a more inclusive interaction design. Future developments should address multi-modal enhancements (e.g., screen-based support or expressive avatars) and involve Deaf users in participatory design to refine robot expressivity and usability.


Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training

arXiv.org Artificial Intelligence

Reliability is essential for ensuring efficiency in LLM training. However, many real-world reliability issues remain difficult to resolve, resulting in wasted resources and degraded model performance. Unfortunately, today's collective communication libraries operate as black boxes, hiding critical information needed for effective root cause analysis. We propose Mycroft, a lightweight distributed tracing and root cause analysis system designed to address previously hidden reliability issues in collective communication. Mycroft's key idea is to trace collective communication states and leverage internal control and data dependencies to resolve reliability problems in LLM training. Mycroft has been deployed at ByteDance for over six months to debug collective communication related issues at runtime. It detected anomalies within 15 seconds in 90% of cases and identified the root cause within 20 seconds in 60% of cases. We also conducted extensive fault injection experiments to demonstrate Mycroft's capability and efficiency.


The Impact of Adaptive Emotional Alignment on Mental State Attribution and User Empathy in HRI

arXiv.org Artificial Intelligence

The paper presents an experiment on the effects of adaptive emotional alignment between agents, considered a prerequisite for empathic communication, in Human-Robot Interaction (HRI). Using the NAO robot, we investigate the impact of an emotionally aligned, empathic, dialogue on these aspects: (i) the robot's persuasive effectiveness, (ii) the user's communication style, and (iii) the attribution of mental states and empathy to the robot. In an experiment with 42 participants, two conditions were compared: one with neutral communication and another where the robot provided responses adapted to the emotions expressed by the users. The results show that emotional alignment does not influence users' communication styles or have a persuasive effect. However, it significantly influences attribution of mental states to the robot and its perceived empathy


AdaptHetero: Machine Learning Interpretation-Driven Subgroup Adaptation for EHR-Based Clinical Prediction

arXiv.org Machine Learning

However, the in t rinsic complexity and heterogeneity of EHR data limit its effectiveness in guiding subgroup - specific modelin g . W e propose AdaptHetero, a novel MLI - driven framework that transforms interpretability insights into actionable guidance for tailor ing model training and evaluation across subpopulations within individual hospital systems . E valuated on th ree large - scale EH R datasets -- GOSSIS - 1 - eICU, WiDS, and MIMIC - IV -- AdaptHetero consistently identif ies heterogeneous model behaviors in predicting ICU mortality, in - hospital death, and hidden hypoxemia. By integrating SHAP - based interpretation and unsupervised clustering, the framework enhances the identification of clinicall y meaningful subgroup - specific characteristics, leading to improved predictive performance and optimized clinical deployment . Introduction Machine learning interpretation (MLI) techniques are increasingly leveraged in the analysis of electronic health records (EHRs) to reveal latent clinical patterns and to support trustworthy, actionable decision - making in high - stakes healthcare settings .


How Age Influences the Interpretation of Emotional Body Language in Humanoid Robots -- long paper version

arXiv.org Artificial Intelligence

There is a general consensus that body movements and postures provide important cues for idennullfying emonullonal states, parnullcularly when facial and vocal signals are unavailable [1]. Emonullonal Body Language (EBL) is rapidly emerging as a significant area of research within cogninullve and affecnullve neuroscience. According to De Gelder [10], numerous valuable insights into human emonullon and its neurobiological foundanullons have been derived from the study of facial expressions. Indeed certain emonullons are more effecnullvely conveyed through facial expressions, while others are benuller commun icated through body movements or a combinanullon of both. Gestures provide observable cues that can be instrumental in recognizing and interprenullng a user's emonullonal state, especially in the absence of verbal or facial signals.


Public Acceptance of Cybernetic Avatars in the service sector: Evidence from a Large-Scale Survey in Dubai

arXiv.org Artificial Intelligence

Cybernetic avatars are hybrid interaction robots or digital representations that combine autonomous capabilities with teleoperated control. This study investigates the acceptance of cybernetic avatars in the highly multicultural society of Dubai, with particular emphasis on robotic avatars for customer service. Specifically, we explore how acceptance varies as a function of robot appearance (e.g., android, robotic-looking, cartoonish), deployment settings (e.g., shopping malls, hotels, hospitals), and functional tasks (e.g., providing information, patrolling). To this end, we conducted a large-scale survey with over 1,000 participants. Overall, cybernetic avatars received a high level of acceptance, with physical robot avatars receiving higher acceptance than digital avatars. In terms of appearance, robot avatars with a highly anthropomorphic robotic appearance were the most accepted, followed by cartoonish designs and androids. Animal-like appearances received the lowest level of acceptance. Among the tasks, providing information and guidance was rated as the most valued. Shopping malls, airports, public transport stations, and museums were the settings with the highest acceptance, whereas healthcare-related spaces received lower levels of support. An analysis by community cluster revealed among others that Emirati respondents showed significantly greater acceptance of android appearances compared to the overall sample, while participants from the 'Other Asia' cluster were significantly more accepting of cartoonish appearances. Our study underscores the importance of incorporating citizen feedback into the design and deployment of cybernetic avatars from the early stages to enhance acceptance of this technology in society.


Automated Testing of COBOL to Java Transformation

arXiv.org Artificial Intelligence

Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of translated Java code from COBOL a necessary but time-consuming and labor-intensive process. In this paper, we share our experience of developing a testing framework for IBM Watsonx Code Assistant for Z (WCA4Z) [5], an industrial tool designed for COBOL to Java translation. The framework automates the process of testing the functional equivalence of the translated Java code against the original COBOL programs in an industry context. Our framework uses symbolic execution to generate unit tests for COBOL, mocking external calls and transforming them into JUnit tests to validate semantic equivalence with translated Java. The results not only help identify and repair any detected discrepancies but also provide feedback to improve the AI model.


On the usability of generative AI: Human generative AI

arXiv.org Artificial Intelligence

Generative AI systems are transforming content creation, but their usability remains a key challenge. This paper examines usability factors such as user experience, transparency, control, and cognitive load. Common challenges include unpredictability and difficulties in fine-tuning outputs. We review evaluation metrics like efficiency, learnability, and satisfaction, highlighting best practices from various domains. Improving interpretability, intuitive interfaces, and user feedback can enhance usability, making generative AI more accessible and effective.