AITopics | Skantze, Gabriel

Collaborating Authors

Skantze, Gabriel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Can You Say to a Robot? Capability Communication Leads to More Natural Conversations

Reimann, Merle M., Hindriks, Koen V., Kunneman, Florian A., Oertel, Catharine, Skantze, Gabriel, Leite, Iolanda

arXiv.org Artificial IntelligenceFeb-3-2025

When encountering a robot in the wild, it is not inherently clear to human users what the robot's capabilities are. When encountering misunderstandings or problems in spoken interaction, robots often just apologize and move on, without additional effort to make sure the user understands what happened. We set out to compare the effect of two speech based capability communication strategies (proactive, reactive) to a robot without such a strategy, in regard to the user's rating of and their behavior during the interaction. For this, we conducted an in-person user study with 120 participants who had three speech-based interactions with a social robot in a restaurant setting. Our results suggest that users preferred the robot communicating its capabilities proactively and adjusted their behavior in those interactions, using a more conversational interaction style while also enjoying the interaction more.

artificial intelligence, interaction, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2502.01448

Country:

Europe > Netherlands (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services > Restaurants (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.35)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.30)

Add feedback

Applying General Turn-taking Models to Conversational Human-Robot Interaction

Skantze, Gabriel, Irfan, Bahar

arXiv.org Artificial IntelligenceJan-15-2025

Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.08946

Country:

Europe (0.93)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection

Inoue, Koji, Lala, Divesh, Skantze, Gabriel, Kawahara, Tatsuya

arXiv.org Artificial IntelligenceOct-21-2024

In human conversations, short backchannel utterances such as "yeah" and "oh" play a crucial role in facilitating smooth and engaging dialogue. These backchannels signal attentiveness and understanding without interrupting the speaker, making their accurate prediction essential for creating more natural conversational agents. This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection (VAP) model. While existing approaches have relied on turn-based or artificially balanced datasets, our approach predicts both the timing and type of backchannels in a continuous and frame-wise manner on unbalanced, real-world datasets. We first pre-train the VAP model on a general dialogue corpus to capture conversational dynamics and then fine-tune it on a specialized dataset focused on backchannel behavior. Experimental results demonstrate that our model outperforms baseline methods in both timing and type prediction tasks, achieving robust performance in real-time environments. This research offers a promising step toward more responsive and human-like dialogue systems, with implications for interactive spoken dialogue applications such as virtual assistants and robots.

backchannel, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15929

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)

Add feedback

Perception of Emotions in Human and Robot Faces: Is the Eye Region Enough?

Mishra, Chinmaya, Skantze, Gabriel, Hagoort, Peter, Verdonschot, Rinus

arXiv.org Artificial IntelligenceOct-18-2024

The increased interest in developing next-gen social robots has raised questions about the factors affecting the perception of robot emotions. This study investigates the impact of robot appearances (humanlike, mechanical) and face regions (full-face, eye-region) on human perception of robot emotions. A between-subjects user study (N = 305) was conducted where participants were asked to identify the emotions being displayed in videos of robot faces, as well as a human baseline. Our findings reveal three important insights for effective social robot face design in Human-Robot Interaction (HRI): Firstly, robots equipped with a back-projected, fully animated face - regardless of whether they are more human-like or more mechanical-looking - demonstrate a capacity for emotional expression comparable to that of humans. Secondly, the recognition accuracy of emotional expressions in both humans and robots declines when only the eye region is visible. Lastly, within the constraint of only the eye region being visible, robots with more human-like features significantly enhance emotion recognition.

artificial intelligence, emotion, recognition, (18 more...)

arXiv.org Artificial Intelligence

2410.14337

Country:

Europe > Netherlands (0.29)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Health & Medicine (1.00)
Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES)

Irfan, Bahar, Miniota, Jura, Thunberg, Sofia, Lagerstedt, Erik, Kuoppamäki, Sanna, Skantze, Gabriel, Pereira, André

arXiv.org Artificial IntelligenceJun-30-2024

Understanding user enjoyment is crucial in human-robot interaction (HRI), as it can impact interaction quality and influence user acceptance and long-term engagement with robots, particularly in the context of conversations with social robots. However, current assessment methods rely solely on self-reported questionnaires, failing to capture interaction dynamics. This work introduces the Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES), a novel scale for assessing user enjoyment from an external perspective during conversations with a robot. Developed through rigorous evaluations and discussions of three annotators with relevant expertise, the scale provides a structured framework for assessing enjoyment in each conversation exchange (turn) alongside overall interaction levels. It aims to complement self-reported enjoyment from users and holds the potential for autonomously identifying user enjoyment in real-time HRI. The scale was validated on 25 older adults' open-domain dialogue with a companion robot that was powered by a large language model for conversations, corresponding to 174 minutes of data, showing moderate to good alignment. The dataset is available online. Additionally, the study offers insights into understanding the nuances and challenges of assessing user enjoyment in robot interactions, and provides guidelines on applying the scale to other domains.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.01354

Country:

Europe > Sweden (0.29)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Joint Learning of Context and Feedback Embeddings in Spoken Dialogue

Qian, Livia, Skantze, Gabriel

arXiv.org Artificial IntelligenceJun-11-2024

Short feedback responses, such as backchannels, play an important role in spoken dialogue. So far, most of the modeling of feedback responses has focused on their timing, often neglecting how their lexical and prosodic form influence their contextual appropriateness and conversational function. In this paper, we investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective. In our evaluation, we primarily focus on how such embeddings can be used as a context-feedback appropriateness metric and thus for feedback response ranking in U.S. English dialogues. Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.07291

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multilingual Turn-taking Prediction Using Voice Activity Projection

Inoue, Koji, Jiang, Bing'er, Ekstedt, Erik, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceMar-14-2024

This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).

machine learning, natural language, prediction, (19 more...)

arXiv.org Artificial Intelligence

2403.06487

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceJan-23-2024

Establishing evaluation schemes for spoken dialogue systems is important, but it can also be challenging. While subjective evaluations are commonly used in user experiments, objective evaluations are necessary for research comparison and reproducibility. To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors. In this paper, to this end, we investigate the relationship between user behaviors and subjective evaluation scores in social dialogue tasks: attentive listening, job interview, and first-meeting conversation. The results reveal that in dialogue tasks where user utterances are primary, such as attentive listening and job interview, indicators like the number of utterances and words play a significant role in evaluation. Observing disfluency also can indicate the effectiveness of formal tasks, such as job interview. On the other hand, in dialogue tasks with high interactivity, such as first-meeting conversation, behaviors related to turn-taking, like average switch pause length, become more important. These findings suggest that selecting appropriate user behaviors can provide valuable insights for objective evaluation in each social dialogue task.

artificial intelligence, dialogue, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.04867

Country: Asia > Japan (0.16)

Genre: Research Report > New Finding (0.50)

Industry: Health & Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

Inoue, Koji, Jiang, Bing'er, Ekstedt, Erik, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceJan-9-2024

A demonstration of a real-time and continuous turn-taking prediction system is presented. The system is based on a voice activity projection (VAP) model, which directly maps dialogue stereo audio to future voice activities. The VAP model includes contrastive predictive coding (CPC) and self-attention transformers, followed by a cross-attention transformer. We examine the effect of the input context audio length and demonstrate that the proposed system can operate in real-time with CPU settings, with minimal performance degradation.

machine learning, natural language, real time system, (14 more...)

arXiv.org Artificial Intelligence

2401.04868

Country: Asia > Japan (0.30)

Genre: Research Report (0.40)

Technology:

Information Technology > Architecture > Real Time Systems (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

Inoue, Koji, Lala, Divesh, Ochi, Keiko, Kawahara, Tatsuya, Skantze, Gabriel

arXiv.org Artificial IntelligenceSep-25-2023

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.

artificial intelligence, natural language, user behavior, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3610661.3617151

2308.1102

Country:

Asia > Japan (0.17)
Europe > France (0.16)
North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (0.36)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.31)

Add feedback