Goto

Collaborating Authors

 turn-taking


Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions

Kim, JiWoo, Chang, Minsuk, Bak, JinYeong

arXiv.org Artificial Intelligence

Traditional text-based human-AI interactions often adhere to a strict turn-taking approach. In this research, we propose a novel approach that incorporates overlapping messages, mirroring natural human conversations. Through a formative study, we observed that even in text-based contexts, users instinctively engage in overlapping behaviors like "A: Today I went to-" "B: yeah." To capitalize on these insights, we developed OverlapBot, a prototype chatbot where both AI and users can initiate overlapping. Our user study revealed that OverlapBot was perceived as more communicative and immersive than traditional turn-taking chatbot, fostering faster and more natural interactions. Our findings contribute to the understanding of design space for overlapping interactions. We also provide recommendations for implementing overlap-capable AI interactions to enhance the fluidity and engagement of text-based conversations.


Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

Wang, Jinhan, Chen, Long, Khare, Aparna, Raju, Anirudh, Dheram, Pranav, He, Di, Wu, Minhua, Stolcke, Andreas, Ravichandran, Venkatesh

arXiv.org Artificial Intelligence

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements. Our approach demonstrates the potential of combined LLMs and acoustic models for a more natural and conversational interaction between humans and speech-enabled AI agents.


Winston

AAAI Conferences

Turn-taking is the ability for agents to lead or follow in social interactions. Turn-taking between humans and intelligent agents has been studied in human-robot interaction but has not been applied to improvisational, dance-based interactions. User understanding and experience of turn-taking in an improvisational, dance-based system known as LuminAI was investigated in a preliminary study of 11 participants. The results showed a trend towards users understanding the difference between turn-taking and non-turn-taking versions of LuminAI but reduced user experience in the turn-taking version.


Turn-Taking with Improvisational Co-Creative Agents

Winston, Lauren (Georgia Institute of Technology) | Magerko, Brian (Georgia Institute of Technology)

AAAI Conferences

Turn-taking is the ability for agents to lead or follow in social interactions. Turn-taking between humans and intelligent agents has been studied in human-robot interaction but has not been applied to improvisational, dance-based interactions. User understanding and experience of turn-taking in an improvisational, dance-based system known as LuminAI was investigated in a preliminary study of 11 participants. The results showed a trend towards users understanding the difference between turn-taking and non-turn-taking versions of LuminAI but reduced user experience in the turn-taking version.


Turn-Taking, Children, and the Unpredictability of Fun

Lehman, Jill Fain (Disney Research) | Leite, Iolanda (Disney Research)

AI Magazine

When the underlying assumptions of commonality of purpose and content break down, the interaction does as well. A great deal of the art of interaction design lies in minimizing what is, from the agent's point of view, out-of-task behavior, both by anticipating natural intask communication and by providing cues to lead participants down the predicted paths. Anticipation and cueing are particularly important in designing interactions for young children, a population that is limited in its ability to understand and adapt to the bounds of a system when things go awry. Most speech and natural language research that focuses on this population has pedagogy (Ogan et al. 2012; Gordon and Breazeal 2015) or therapy As explained briefly by Edith, there are two main game actions: effecting a change to the model by naming one of the clothing items or accessories on the board, and requesting a picture of the increasingly crazily clad model to be printed and taken home afterward. The majority of the interaction consists of 20 choice cycles during each of which a valid reference to a board item is made, the model changes, and a replacement item appears.


Signalizing and Predicting Turn-Taking in Multilingual Contexts: Using Data from Transcribed International Spoken Journalistic Texts in Human-Robot Interaction

Alexandris, Christina (National University of Athens)

AAAI Conferences

Data from transcribed spoken journalistic texts from international news networks is employed in the signalization and prediction of turn-taking in Human-Computer Interaction and Human-Robot Interaction in multilingual contexts, taking into account the verbal and non-verbal behavior of international speakers.


Turn-Taking in Commander-Robot Navigator Dialog

Cassidy, Taylor (US Army Research Laboratory) | Voss, Clare (US Army Research Laboratory) | Summers-Stay, Douglas (US Army Research Laboratory)

AAAI Conferences

We seek to develop a robot that will be capable of teaming with humans to accomplish physical exploration tasks that would not otherwise be possible in dynamic, dangerous environments. For such tasks, a human commander needs to be able to communicate with a robot that moves out of sight and relays information back to the commander. What is the best way to determine how a human commander would interact in a multi-modal spoken dialog with such a robot to accomplish tasks? In this paper, we describe our initial approach to discovering a principled basis for coordinating turn-taking, perception, and navigational behavior of a robot in communication with a commander, by identifying decision phases in dialogs collected in a WoZ framework. We present two types of utterance annotation with examples applied to task-oriented dialog between a human commander and a human ``robot navigator'' who controls the physical robot in a realistic environment similar to expected actual conditions. We discuss core robot capabilities that bear on the robot navigator's ability to take turns while performing a ``find the building doors'' task at hand. The paper concludes with a brief overview of ongoing work to implement these decision phases within an open-source dialog management framework, constructing a task tree specification and dialog control logic for our application domain.


Turn-Taking in Commander-Robot Navigator Dialog (Video Abstract)

Cassidy, Taylor (US Army Research Laboratory) | Voss, Clare (US Army Research Laboratory) | Summers-Stay, Douglas (US Army Research Laboratory)

AAAI Conferences

The accompanying video captures the multi-modal data displays and speech dialogue of a human Commander (C) and a human Robot Navigator (RN) tele-operating a mobile robot (R) in a remote, previously unexplored area. We describe unique challenges for automation of turn-taking and coordination processes observed in the data.


On the Challenges and Opportunities of Physically Situated Dialog

Bohus, Dan (Microsoft Research) | Horvitz, Eric (Microsoft Research)

AAAI Conferences

We outline several challenges and opportunities for building physically situated systems that can interact in open, dynamic, and relatively unconstrained environments. We review a platform and recent progress on developing computational methods for situated, multiparty, open-world dialog, and highlight the value of representations of the physical surroundings and of harnessing the broader situational context when managing communicative processes such as engagement, turn taking, language understanding, and dialog management. Finally, we outline an open-world learning challenge that spans these different levels


Turn Taking for Human-Robot Interaction

Chao, Crystal (Georgia Institute of Technology) | Thomaz, Andrea Lockerd ( Georgia Institute of Technology )

AAAI Conferences

Applications in Human-Robot Interaction (HRI) in the not-so-distant future include robots that collaborate with factory workers or serve us as caregivers or waitstaff. When offering customized functionality in these dynamic environments, robots need to engage in real-time exchanges with humans. Robots thus need to be capable of participating in smooth turn-taking interactions. The research goal in HRI of unstructured dialogic interaction would allow communication with robots that is as natural as communication with other humans. Turn-taking is the framework that provides structure for human communication. Consciously or subconsciously, humans are able to communicate their understanding and control of the turn structure to a conversation partner by using syntax, semantics, paralinguistic cues, eye gaze, and body language in a socially intelligent way. Our research aims to show that by implementing these turn-taking cues within a interaction architecture that is designed fundamentally for turn-taking, a robot becomes easier and more efficient for a human to interact with. This paper outlines our approach and initial pilot study into this line of research.