AITopics | multiwoz 2

Collaborating Authors

multiwoz 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond More Context: Retrieval Diversity Boosts Multi-Turn Intent Understanding

Lin, Zhiming

arXiv.org Artificial IntelligenceOct-22-2025

Multi turn intent understanding is central to task oriented chatbots, yet real deployments face tight token budgets and noisy contexts, and most retrieval pipelines emphasize relevance while overlooking set level diversity and confounds such as more context or exemplar order. We ask whether retrieval diversity, rather than longer prompts, systematically improves LLM intent understanding under fixed budgets. We present a diversity aware retrieval framework that selects in context exemplars to balance intent coverage and linguistic variety, and integrates this selection with standard LLM decoders; the evaluation enforces budget matched prompts and randomized positions, and includes sensitivity analyses over exemplar count, diversity strength, and backbone size. On MultiWOZ 2.4 and SGD, the approach achieves strong gains in Joint Goal Accuracy under equal token budgets, surpassing strong LLM/DST baselines, with consistent improvements across K from 4 to 7 and moderate latency. Overall, the study isolates and validates the impact of content diversity in retrieval and offers a simple, deployable selection principle for building accurate, budget constrained multi turn intent systems.

exemplar, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.1794

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs

Ahmad, Adnan, Kowol, Philine, Hillmann, Stefan, Möller, Sebastian

arXiv.org Artificial IntelligenceSep-15-2025

In this paper, we provide an extensive analysis of multi-label intent classification using Large Language Models (LLMs) that are open-source, publicly available, and can be run in consumer hardware. We use the MultiWOZ 2.1 dataset, a benchmark in the dialogue system domain, to investigate the efficacy of three popular open-source pre-trained LLMs, namely LLama2-7B-hf, Mistral-7B-v0.1, and Yi-6B. We perform the classification task in a few-shot setup, giving 20 examples in the prompt with some instructions. Our approach focuses on the differences in performance of these models across several performance metrics by methodically assessing these models on multi-label intent classification tasks. Additionally, we compare the performance of the instruction-based fine-tuning approach with supervised learning using the smaller transformer model BertForSequenceClassification as a baseline. To evaluate the performance of the models, we use evaluation metrics like accuracy, precision, and recall as well as micro, macro, and weighted F1 score. We also report the inference time, VRAM requirements, etc. The Mistral-7B-v0.1 outperforms two other generative models on 11 intent classes out of 14 in terms of F-Score, with a weighted average of 0.50. It also has relatively lower Humming Loss and higher Jaccard Similarity, making it the winning model in the few-shot setting. We find BERT based supervised classifier having superior performance compared to the best performing few-shot generative LLM. The study provides a framework for small open-source LLMs in detecting complex multi-intent dialogues, enhancing the Natural Language Understanding aspect of task-oriented chatbots.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2509.1001

Country: Europe > Germany (0.15)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

A Computational Approach to Modeling Conversational Systems: Analyzing Large-Scale Quasi-Patterned Dialogue Flows

Ammar, Mohamed Achref Ben, Bennani, Mohamed Taha

arXiv.org Artificial IntelligenceJul-21-2025

--The analysis of conversational dynamics has gained increasing importance with the rise of large language model-based systems, which interact with users across diverse contexts. In this work, we propose a novel computational framework for constructing conversational graphs that capture the flow and structure of loosely organized dialogues, referred to as quasi-patterned conversations. We introduce the Filter & Reconnect method, a novel graph simplification technique that minimizes noise while preserving semantic coherence and structural integrity of conversational graphs. Through comparative analysis, we demonstrate that the use of large language models combined with our graph simplification technique has resulted in semantic metric S increasing by a factor of 2.06 compared to previous approaches while simultaneously enforcing a tree-like structure with 0 δ -hyperbolicity, ensuring optimal clarity in conversation modeling. This work provides a computational method for analyzing large-scale dialogue datasets, with practical applications related to monitoring automated systems such as chatbots, dialogue management tools, and user behavior analytics.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/EUROCON64445.2025.11073224

2507.13544

Country:

Asia (0.28)
Africa > Middle East > Tunisia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spec-TOD: A Specialized Instruction-Tuned LLM Framework for Efficient Task-Oriented Dialogue Systems

Nguyen, Quang-Vinh, Nguyen, Quang-Chieu, Pham, Hoang, Bui, Khac-Hoai Nam

arXiv.org Artificial IntelligenceJul-8-2025

Task-oriented dialogue (TOD) systems facilitate goal-driven interactions between users and machines. While recent advances in deep learning have improved the performance, TOD systems often struggle in low-resource scenarios with limited labeled data. To address this challenge, we propose Spec-TOD, a novel framework designed to train an end-to-end TOD system with limited data. Spec-TOD introduces two main innovations: (i) a novel specialized end-to-end TOD framework that incorporates explicit task instructions for instruction-tuned large language models (LLMs), and (ii) an efficient training strategy that leverages lightweight, specialized LLMs to achieve strong performance with minimal supervision. Experiments on the MultiWOZ dataset, a widely used TOD benchmark, demonstrate that Spec-TOD achieves competitive results while significantly reducing the need for labeled data. These findings highlight the potential of the proposed framework in advancing efficient and effective TOD systems in low-resource settings.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.04841

Country:

North America > Canada (0.05)
Asia > Singapore (0.04)
Europe > Italy (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models

Song, Sangmin, Choi, Juhwan, Yun, JungMin, Kim, YoungBin

arXiv.org Artificial IntelligenceJun-13-2025

Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in extracting and tracking dialogue states amidst multiple speakers. Our findings emphasize the need for future research to enhance LLMs for multi-user DST scenarios, paving the way for more realistic and robust DST models.

large language model, machine learning, utterance, (21 more...)

arXiv.org Artificial Intelligence

2506.10504

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs

Carranza, Rafael, Rojas, Mateo Alejandro

arXiv.org Artificial IntelligenceMar-11-2025

This paper introduces a novel approach to Dialogue State Tracking (DST) that leverages Large Language Models (LLMs) to generate natural language descriptions of dialogue states, moving beyond traditional slot-value representations. Conventional DST methods struggle with open-domain dialogues and noisy inputs. Motivated by the generative capabilities of LLMs, our Natural Language DST (NL-DST) framework trains an LLM to directly synthesize human-readable state descriptions. We demonstrate through extensive experiments on MultiWOZ 2.1 and Taskmaster-1 datasets that NL-DST significantly outperforms rule-based and discriminative BERT-based DST baselines, as well as generative slot-filling GPT-2 DST models, in both Joint Goal Accuracy and Slot Accuracy. Ablation studies and human evaluations further validate the effectiveness of natural language state generation, highlighting its robustness to noise and enhanced interpretability. Our findings suggest that NL-DST offers a more flexible, accurate, and human-understandable approach to dialogue state tracking, paving the way for more robust and adaptable task-oriented dialogue systems.

computational linguistic, dialogue state, state description, (13 more...)

arXiv.org Artificial Intelligence

2503.08857

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Thailand > Bangkok > Bangkok (0.05)
(8 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.86)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model

Acikgoz, Emre Can, Greer, Jeremiah, Datta, Akul, Yang, Ze, Zeng, William, Elachqar, Oussama, Koukoumidis, Emmanouil, Hakkani-Tür, Dilek, Tur, Gokhan

arXiv.org Artificial IntelligenceFeb-18-2025

Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA), and our analyses reveal that specialized approaches excel in one domain but underperform in the other. To bridge this chasm, we introduce CoALM (Conversational Agentic Language Model), a unified approach that integrates both conversational and agentic capabilities. We created CoALM-IT, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage. Using CoALM-IT, we train three models CoALM 8B, CoALM 70B, and CoALM 405B, which outperform top domain-specific models, including GPT-4o, across all three benchmarks.This demonstrates the feasibility of a single model approach for both TOD and LA, setting a new standard for conversational agents.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.0882

Country:

Asia (1.00)
Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Safa, Abdulfattah, Şahin, Gözde Gül

arXiv.org Artificial IntelligenceJan-8-2025

Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.

dataset, multiwoz 2, slot value, (15 more...)

arXiv.org Artificial Intelligence

2409.15861

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
North America > Dominican Republic (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry: Transportation > Ground (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intent-driven In-context Learning for Few-shot Dialogue State Tracking

Yi, Zihao, Xu, Zhe, Shen, Ying

arXiv.org Artificial IntelligenceDec-4-2024

Dialogue state tracking (DST) plays an essential role in task-oriented dialogue systems. However, user's input may contain implicit information, posing significant challenges for DST tasks. Additionally, DST data includes complex information, which not only contains a large amount of noise unrelated to the current turn, but also makes constructing DST datasets expensive. To address these challenges, we introduce Intent-driven In-context Learning for Few-shot DST (IDIC-DST). By extracting user's intent, we propose an Intent-driven Dialogue Information Augmentation module to augment the dialogue information, which can track dialogue states more effectively. Moreover, we mask noisy information from DST data and rewrite user's input in the Intent-driven Examples Retrieval module, where we retrieve similar examples. We then utilize a pre-trained large language model to update the dialogue state using the augmented dialogue information and examples. Experimental results demonstrate that IDIC-DST achieves state-of-the-art performance in few-shot settings on MultiWOZ 2.1 and MultiWOZ 2.4 datasets.

computational linguistic, dialogue state tracking, information, (13 more...)

arXiv.org Artificial Intelligence

2412.0327

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
North America > United States > Rhode Island (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems

Gupta, Aman, Ravichandran, Anirudh, Zhang, Ziji, Shah, Swair, Beniwal, Anurag, Sadagopan, Narayanan

arXiv.org Artificial IntelligenceNov-1-2024

Task-oriented dialogue systems are essential for applications ranging from customer service to personal assistants and are widely used across various industries. However, developing effective multi-domain systems remains a significant challenge due to the complexity of handling diverse user intents, entity types, and domain-specific knowledge across several domains. In this work, we propose DARD (Domain Assigned Response Delegation), a multi-agent conversational system capable of successfully handling multi-domain dialogs. DARD leverages domain-specific agents, orchestrated by a central dialog manager agent. Our extensive experiments compare and utilize various agent modeling approaches, combining the strengths of smaller fine-tuned models (Flan-T5-large & Mistral-7B) with their larger counterparts, Large Language Models (LLMs) (Claude Sonnet 3.0). We provide insights into the strengths and limitations of each approach, highlighting the benefits of our multi-agent framework in terms of flexibility and composability. We evaluate DARD using the well-established MultiWOZ benchmark, achieving state-of-the-art performance by improving the dialogue inform rate by 6.6% and the success rate by 4.1% over the best-performing existing approaches. Additionally, we discuss various annotator discrepancies and issues within the MultiWOZ dataset and its evaluation system.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.00427

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback