Goto

Collaborating Authors

 current turn


TED: Turn Emphasis with Dialogue Feature Attention for Emotion Recognition in Conversation

Ono, Junya, Wakaki, Hiromi

arXiv.org Artificial Intelligence

Emotion recognition in conversation (ERC) has been attracting attention by methods for modeling multi-turn contexts. The multi-turn input to a pretraining model implicitly assumes that the current turn and other turns are distinguished during the training process by inserting special tokens into the input sequence. This paper proposes a priority-based attention method to distinguish each turn explicitly by adding dialogue features into the attention mechanism, called Turn Emphasis with Dialogue (TED). It has a priority for each turn according to turn position and speaker information as dialogue features. It takes multi-head self-attention between turn-based vectors for multi-turn input and adjusts attention scores with the dialogue features. We evaluate TED on four typical benchmarks. The experimental results demonstrate that TED has high overall performance in all datasets and achieves state-of-the-art performance on IEMOCAP with numerous turns.


A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4

Gu, Ming, Yang, Yan

arXiv.org Artificial Intelligence

Dialogue state tracking (DST) is evaluated by exact matching methods, which rely on large amounts of labeled data and ignore semantic consistency, leading to over-evaluation. Currently, leveraging large language models (LLM) in evaluating natural language processing tasks has achieved promising results. However, using LLM for DST evaluation is still under explored. In this paper, we propose a two-dimensional zero-shot evaluation method for DST using GPT-4, which divides the evaluation into two dimensions: accuracy and completeness. Furthermore, we also design two manual reasoning paths in prompting to further improve the accuracy of evaluation. Experimental results show that our method achieves better performance compared to the baselines, and is consistent with traditional exact matching based methods.


Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations

Lin, Guan-Ting, Chiang, Cheng-Han, Lee, Hung-yi

arXiv.org Artificial Intelligence

In spoken dialogue, even if two current turns are the same sentence, their responses might still differ when they are spoken in different styles. The spoken styles, containing paralinguistic and prosodic information, mark the most significant difference between text and speech modality. When using text-only LLMs to model spoken dialogue, text-only LLMs cannot give different responses based on the speaking style of the current turn. In this paper, we focus on enabling LLMs to listen to the speaking styles and respond properly. Our goal is to teach the LLM that "even if the sentences are identical if they are spoken in different styles, their corresponding responses might be different". Since there is no suitable dataset for achieving this goal, we collect a speech-to-speech dataset, StyleTalk, with the following desired characteristics: when two current speeches have the same content but are spoken in different styles, their responses will be different. To teach LLMs to understand and respond properly to the speaking styles, we propose the Spoken-LLM framework that can model the linguistic content and the speaking styles. We train Spoken-LLM using the StyleTalk dataset and devise a two-stage training pipeline to help the Spoken-LLM better learn the speaking styles. Based on extensive experiments, we show that Spoken-LLM outperforms text-only baselines and prior speech LLMs methods.


I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue

Li, Yuanchao, Inoue, Koji, Tian, Leimin, Fu, Changzeng, Ishi, Carlos, Ishiguro, Hiroshi, Kawahara, Tatsuya, Lai, Catherine

arXiv.org Artificial Intelligence

Current Spoken Dialogue Systems (SDSs) often serve as passive listeners that respond only after receiving user speech. To achieve human-like dialogue, we propose a novel future prediction architecture that allows an SDS to anticipate future affective reactions based on its current behaviors before the user speaks. In this work, we investigate two scenarios: speech and laughter. In speech, we propose to predict the user's future emotion based on its temporal relationship with the system's current emotion and its causal relationship with the system's current Dialogue Act (DA). In laughter, we propose to predict the occurrence and type of the user's laughter using the system's laughter behaviors in the current turn. Preliminary analysis of human-robot dialogue demonstrated synchronicity in the emotions and laughter displayed by the human and robot, as well as DA-emotion causality in their dialogue. This verifies that our architecture can contribute to the development of an anticipatory SDS.


Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

Le, Hung, Chen, Nancy F., Hoi, Steven C. H.

arXiv.org Artificial Intelligence

Compared to traditional visual question answering, video-grounded dialogues require additional reasoning over dialogue context to answer questions in a multi-turn setting. Previous approaches to video-grounded dialogues mostly use dialogue context as a simple text input without modelling the inherent information flows at the turn level. In this paper, we propose a novel framework of Reasoning Paths in Dialogue Context (PDC). PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer. PDC model then learns to predict reasoning paths over this semantic graph. Our path prediction model predicts a path from the current turn through past dialogue turns that contain additional visual cues to answer the current question. Our reasoning model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer. Our experimental results demonstrate the effectiveness of our method and provide additional insights on how models use semantic dependencies in a dialogue context to retrieve visual cues.


AI for Massive Multiplayer Online Strategy Games

Barata, Alexandre Miguel (Instituto Superior Tecnico, Technical University of Lisbon) | Santos, Pedro Alexandre (Instituto Superior Tecnico, Technical University of Lisbon) | Prada, Rui (Instituto Superior Tecnico, Technical University of Lisbon)

AAAI Conferences

Massive Multiplayer Online Strategy games present several unique challenges to players and designers. There is the need to constantly adapt to changes in the game itself and the need to achieve a certain level of simulation and realism, which typically implies battles involving combat with several distinct armies, combat phases and diferent terrains; resource management which involves buying and selling goods and combining lots of diferent kinds of resources to fund the player's nation and cutthroat diplomacy which dictates the pace of the game. However, these constant changes and simulation mechanisms make a game harder to play, increasing the amount of effort required to play it properly. As some of these games take months to be played, players who become inactive have a negative impact on the game. This work pretends to demonstrate how to create versatile agents for playing Massive Multiplayer Online Turn Based Strategy Games, while keeping close attention to their playing performance. In a test to measure this performance the results showed similar survival performance between humans and AIs.