AITopics | Chen, Maximillian

Collaborating Authors

Chen, Maximillian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling

Chen, Maximillian, Sun, Ruoxi, Arık, Sercan Ö.

arXiv.org Artificial IntelligenceDec-20-2024

Conversational assistants are increasingly popular across diverse real-world applications, highlighting the need for advanced multimodal speech modeling. Speech, as a natural mode of communication, encodes rich user-specific characteristics such as speaking rate and pitch, making it critical for effective interaction. Our work introduces a data-centric customization approach for efficiently enhancing multimodal understanding in conversational speech modeling. Central to our contributions is a novel multi-task learning paradigm that involves designing auxiliary tasks to utilize a small amount of speech data. Our approach achieves state-of-the-art performance on the Spoken-SQuAD benchmark, using only 10% of the training data with open-weight models, establishing a robust and efficient framework for audio-centric conversational modeling. We also introduce ASK-QA, the first dataset for multi-turn spoken dialogue with ambiguous user requests and dynamic evaluation inputs. Code and data forthcoming.

ask-qa, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.15995

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.67)
Transportation (0.47)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Chen, Maximillian, Sun, Ruoxi, Arık, Sercan Ö., Pfister, Tomas

arXiv.org Artificial IntelligenceMay-31-2024

Large language models (LLMs) aligned through reinforcement learning from human feedback (RLHF) have quickly become one of the dominant paradigms for building intelligent conversational assistant agents. However, despite their strong performance across many benchmarks, LLM-based agents still lack conversational skills such as disambiguation: when generalized assistants are faced with ambiguity, they often overhedge or implicitly guess users' ground-truth intents rather than asking clarification questions, and under task-specific settings, high-quality conversation samples are often limited, affecting models' ability to learn optimal dialogue action policies. We propose Action-Based Contrastive Self-Training (henceforth ACT), a quasi-online preference optimization algorithm based on Direct Preference Optimization (DPO) which allows for sample-efficient dialogue policy learning in multi-turn conversation. We demonstrate ACT's efficacy under sample-efficient conditions in three difficult conversational tasks: tabular-grounded question-answering, machine reading comprehension, and AmbigSQL, a novel task for disambiguating information-seeking requests for text-to-SQL generation. Additionally, we propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard approaches to supervised fine-tuning and DPO.

large language model, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2406.00222

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning

Yu, Xiao, Chen, Maximillian, Yu, Zhou

arXiv.org Artificial IntelligenceOct-19-2023

Planning for goal-oriented dialogue often requires simulating future dialogue interactions and estimating task progress. Many approaches thus consider training neural networks to perform look-ahead search algorithms such as A* search and Monte Carlo Tree Search (MCTS). However, this training often requires abundant annotated data, which creates challenges when faced with noisy annotations or low-resource settings. We introduce GDP-Zero, an approach using Open-Loop MCTS to perform goal-oriented dialogue policy planning without any model training. GDP-Zero prompts a large language model to act as a policy prior, value function, user simulator, and system model during the tree search. We evaluate GDP-Zero on the goal-oriented task PersuasionForGood, and find that its responses are preferred over ChatGPT up to 59.32% of the time, and are rated more persuasive than ChatGPT during interactive evaluations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.1366

Country:

Europe (0.92)
North America > United States (0.67)
Asia > Middle East > UAE (0.14)

Genre:

Research Report (1.00)
Personal > Interview (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Controllable Mixed-Initiative Dialogue Generation through Prompting

Chen, Maximillian, Yu, Xiao, Shi, Weiyan, Awasthi, Urvi, Yu, Zhou

arXiv.org Artificial IntelligenceMay-6-2023

Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control. Conversational agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner. The standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents. However, these supervised generation models are limited by the cost and quality of data annotation. We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation. We formalize prompt construction for controllable mixed-initiative dialogue. Our findings show improvements over fine-tuning and ground truth responses according to human evaluation and automatic metrics for two tasks: PersuasionForGood and Emotional Support Conversations.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.04147

Country:

North America > United States > Minnesota (0.28)
Europe > Italy (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Chen, Maximillian, Yu, Zhou

arXiv.org Artificial IntelligenceFeb-27-2023

Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.12921

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

PLACES: Prompting Language Models for Social Conversation Synthesis

Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Kim, Seokhwan, Rosenbaum, Andy, Liu, Yang, Yu, Zhou, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceFeb-16-2023

Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.

artificial intelligence, chatbot, natural language, (16 more...)

arXiv.org Artificial Intelligence

2302.03269

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report (1.00)
Personal > Interview (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Sports (0.93)
Health & Medicine (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Chen, Maximillian, Chen, Caitlyn, Yu, Xiao, Yu, Zhou

arXiv.org Artificial IntelligenceJan-27-2023

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2203.08299

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback

Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding

Chen, Maximillian, Papangelis, Alexandros, Tao, Chenyang, Rosenbaum, Andy, Kim, Seokhwan, Liu, Yang, Yu, Zhou, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceNov-2-2022

Dialogue understanding tasks often necessitate abundant annotated data to achieve good performance and that presents challenges in low-resource settings. To alleviate this barrier, we explore few-shot data augmentation for dialogue understanding by prompting large pre-trained language models and present a novel approach that iterates on augmentation quality by applying weakly-supervised filters. We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue. Models fine-tuned on our augmented data mixed with few-shot ground truth data are able to approach or surpass existing state-of-the-art performance on both datasets. For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.14169

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback