non-verbal cue
Enhancing Public Speaking Skills in Engineering Students Through AI
Harsh, Amol, Prince, Brainerd, Siddharth, Siddharth, Muthirayan, Deepan Raj Prabakar, Bhalla, Kabir S, Gupta, Esraaj Sarkar, Sahu, Siddharth
This research-to-practice full paper was inspired by the persistent challenge in effective communication among engineering students. Public speaking is a necessary skill for future engineers as they have to communicate technical knowledge with diverse stakeholders. While universities offer courses or workshops, they are unable to offer sustained and personalized training to students. Providing comprehensive feedback on both verbal and non-verbal aspects of public speaking is time-intensive, making consistent and individualized assessment impractical. This study integrates research on verbal and non-verbal cues in public speaking to develop an AI-driven assessment model for engineering students. Our approach combines speech analysis, computer vision, and sentiment detection into a multi-modal AI system that provides assessment and feedback. The model evaluates (1) verbal communication (pitch, loudness, pacing, intonation), (2) non-verbal communication (facial expressions, gestures, posture), and (3) expressive coherence, a novel integration ensuring alignment between speech and body language. Unlike previous systems that assess these aspects separately, our model fuses multiple modalities to deliver personalized, scalable feedback. Preliminary testing demonstrated that our AI-generated feedback was moderately aligned with expert evaluations. Among the state-of-the-art AI models evaluated, all of which were Large Language Models (LLMs), including Gemini and OpenAI models, Gemini Pro emerged as the best-performing, showing the strongest agreement with human annotators. By eliminating reliance on human evaluators, this AI-driven public speaking trainer enables repeated practice, helping students naturally align their speech with body language and emotion, crucial for impactful and professional communication.
Beyond Words: Enhancing Desire, Emotion, and Sentiment Recognition with Non-Verbal Cues
Chen, Wei, Wang, Tongguan, Xue, Feiyue, Li, Junkai, Liu, Hui, Sha, Ying
Desire, as an intention that drives human behavior, is closely related to both emotion and sentiment. Multimodal learning has advanced sentiment and emotion recognition, but multimodal approaches specially targeting human desire understanding remain underexplored. And existing methods in sentiment analysis predominantly emphasize verbal cues and overlook images as complementary non-verbal cues. To address these gaps, we propose a Symmetrical Bidirectional Multimodal Learning Framework for Desire, Emotion, and Sentiment Recognition, which enforces mutual guidance between text and image modalities to effectively capture intention-related representations in the image. Specifically, low-resolution images are used to obtain global visual representations for cross-modal alignment, while high resolution images are partitioned into sub-images and modeled with masked image modeling to enhance the ability to capture fine-grained local features. A text-guided image decoder and an image-guided text decoder are introduced to facilitate deep cross-modal interaction at both local and global representations of image information. Additionally, to balance perceptual gains with computation cost, a mixed-scale image strategy is adopted, where high-resolution images are cropped into sub-images for masked modeling. The proposed approach is evaluated on MSED, a multimodal dataset that includes a desire understanding benchmark, as well as emotion and sentiment recognition. Experimental results indicate consistent improvements over other state-of-the-art methods, validating the effectiveness of our proposed method. Specifically, our method outperforms existing approaches, achieving F1-score improvements of 1.1% in desire understanding, 0.6% in emotion recognition, and 0.9% in sentiment analysis. Our code is available at: https://github.com/especiallyW/SyDES.
Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task
Ali, Hassan, Allgeuer, Philipp, Wermter, Stefan
Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collaborative object categorization task with a physical robot. We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions and combining them with environment states and user verbal cues captured using an existing Automatic Speech Recognition (ASR) system. Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities and real-world knowledge to support intention prediction during human-robot interaction.
Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction
The rise of Large Language Models (LLMs) has affected various disciplines that got beyond mere text generation. Going beyond their textual nature, this project proposal aims to investigate the interaction between LLMs and non-verbal communication, specifically focusing on gestures. The proposal sets out a plan to examine the proficiency of LLMs in deciphering both explicit and implicit non-verbal cues within textual prompts and their ability to associate these gestures with various contextual factors. The research proposes to test established psycholinguistic study designs to construct a comprehensive dataset that pairs textual prompts with detailed gesture descriptions, encompassing diverse regional variations, and semantic labels. To assess LLMs' comprehension of gestures, experiments are planned, evaluating their ability to simulate human behaviour in order to replicate psycholinguistic experiments. These experiments consider cultural dimensions and measure the agreement between LLM-identified gestures and the dataset, shedding light on the models' contextual interpretation of non-verbal cues (e.g. gestures).
Mastering the art of effective communication skills
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. Communication is the bedrock of human interaction, influencing every facet of our lives -- from our personal connections to our professional endeavors. Beyond being a beneficial skill, effective communication stands as a vital asset in shaping the depth of our relationships, steering the course of our careers and serving as an incentive for personal growth and fulfillment. Communication resonates far beyond mere conversation; it's the foundation that underpins our connections, aspirations and journey toward self-improvement.
The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed
Numerous investigations have explored a range of features and modeling approaches aimed at enhancing modeling Research on pronunciation assessment systems focuses performance. These explorations have encompassed the utilization on utilizing phonetic and phonological aspects of non-native of Goodness-of-Pronunciation (GOP) metrics [4, 5, (L2) speech, often neglecting the rich layer of information 6], the integration of manually crafted handful of non-verbal hidden within the non-verbal cues. In this study, we proposed features such as duration, energy, and pitch [7, 8, 9], as well a novel pronunciation assessment framework, IntraVerbalPA.
Developing Social Robots with Empathetic Non-Verbal Cues Using Large Language Models
Lee, Yoon Kyung, Jung, Yoonwon, Kang, Gyuyi, Hahn, Sowon
We propose augmenting the empathetic capacities of social robots by integrating non-verbal cues. Our primary contribution is the design and labeling of four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot. These cues are generated using a Large Language Model (LLM). We developed an LLM-based conversational system for the robot and assessed its alignment with social cues as defined by human counselors. Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures. Despite these tendencies, our approach has led to the development of a social robot capable of context-aware and more authentic interactions. Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
Understanding the Uncertainty Loop of Human-Robot Interaction
Leusmann, Jan, Wang, Chao, Gienger, Michael, Schmidt, Albrecht, Mayer, Sven
Recently the field of Human-Robot Interaction gained popularity, due to the wide range of possibilities of how robots can support humans during daily tasks. One form of supportive robots are socially assistive robots which are specifically built for communicating with humans, e.g., as service robots or personal companions. As they understand humans through artificial intelligence, these robots will at some point make wrong assumptions about the humans' current state and give an unexpected response. In human-human conversations, unexpected responses happen frequently. However, it is currently unclear how such robots should act if they understand that the human did not expect their response, or even showing the uncertainty of their response in the first place. For this, we explore the different forms of potential uncertainties during human-robot conversations and how humanoids can, through verbal and non-verbal cues, communicate these uncertainties.
This $5 billion insurance company likes to talk up its AI. Now it's in a mess over it
A key part of insurance company Lemonade's pitch to investors and customers is its ability to disrupt the normally staid insurance industry with artificial intelligence. It touts friendly chatbots like AI Maya and AI Jim, which help customers sign up for policies for things like homeowners' or pet health insurance, and file claims through Lemonade's app. And it has raised hundreds of millions of dollars from public and private market investors, in large part by positioning itself as an AI-powered tool. Yet less than a year after its public market debut, the company, now valued at $5 billion, finds itself in the middle of a PR controversy related to the technology that underpins its services. On Twitter and in a blog post on Wednesday, Lemonade explained why it deleted what it called an "awful thread" of tweets it had posted on Monday. Those now-deleted tweets had said, among other things, that the company's AI analyzes the videos that users submit when they file insurance claims for signs of fraud, picking up "non-verbal cues that traditional insurers can't, since they don't use a digital claims process."
A disturbing, viral Twitter thread reveals how AI-powered insurance can go wrong
Lemonade, the fast-growing, machine learning-powered insurance app, put out a real lemon of a Twitter thread on Monday with a proud declaration that its AI analyzes videos of customers when determining if their claims are fraudulent. The company has been trying to explain itself and its business model -- and fend off serious accusations of bias, discrimination, and general creepiness -- ever since. The prospect of being judged by AI for something as important as an insurance claim was alarming to many who saw the thread, and it should be. We've seen how AI can discriminate against certain races, genders, economic classes, and disabilities, among other categories, leading to those people being denied housing, jobs, education, or justice. Now we have an insurance company that prides itself on largely replacing human brokers and actuaries with bots and AI, collecting data about customers without them realizing they were giving it away, and using those data points to assess their risk.