user utterance
Supplementary Materials A Appendix 1 A.1 Construction & Schema Details 2 A.1.1 Conversation Details 3
The hotel
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Africa > South Africa (0.04)
- North America > Canada (0.04)
- (4 more...)
- Consumer Products & Services > Restaurants (1.00)
- Consumer Products & Services > Hotels (0.96)
- Transportation > Ground > Road (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Africa > South Africa (0.04)
- (13 more...)
- Overview (0.46)
- Research Report > New Finding (0.46)
- Law (0.93)
- Consumer Products & Services > Restaurants (0.68)
- Information Technology (0.68)
- Transportation > Ground > Road (0.46)
ConvFill: Model Collaboration for Responsive Conversational Voice Agents
Srinivas, Vidya, Englhardt, Zachary, Powers, Maximus, Patel, Shwetak, Iyer, Vikram
Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency that disrupts natural conversation, while on-device models respond immediately but lack sophistication. We propose conversational infill, a task where a lightweight on-device model generates contextually appropriate dialogue while seamlessly incorporating streaming knowledge from a powerful backend model. This approach decouples response latency from model capability, enabling systems that feel responsive while accessing the full power of large-scale models. We present ConvFill, a 360M parameter model trained on synthetic multi-domain conversations. Evaluation across multiple backend models shows that conversational infill can be successfully learned, with ConvFill achieving accuracy improvements of 36-42% over standalone small models of the same size while consistently retaining sub-200ms response latencies. Our results demonstrate the promise of this approach for building on-device conversational agents that are both immediately responsive and knowledgeable.
- North America > United States (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
Bohus, Dan, Andrist, Sean, Paradiso, Ann, Saw, Nick, Schoonbeek, Tim, Stiber, Maia
We introduce SigmaCollab, a dataset enabling research on physically situated human-AI collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing procedural tasks in the physical world. SigmaCollab includes a set of rich, multimodal data streams, such as the participant and system audio, egocentric camera views from the head-mounted device, depth maps, head, hand and gaze tracking information, as well as additional annotations performed post-hoc. While the dataset is relatively small in size (~ 14 hours), its application-driven and interactive nature brings to the fore novel research challenges for human-AI collaboration, and provides more realistic testing grounds for various AI models operating in this space. In future work, we plan to use the dataset to construct a set of benchmarks for physically situated collaboration in mixed-reality task assistive scenarios. SigmaCollab is available at https://github.com/microsoft/SigmaCollab.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States > Kentucky (0.04)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Human Computer Interaction > Interfaces (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing
Xiang, Hao, Tang, Tianyi, Su, Yang, Yu, Bowen, Yang, An, Huang, Fei, Zhang, Yichang, Lu, Yaojie, Lin, Hongyu, Han, Xianpei, Zhou, Jingren, Lin, Junyang, Sun, Le
Recent advancements in Large Language Models (LLMs) have shown outstanding potential for role-playing applications. Evaluating these capabilities is becoming crucial yet remains challenging. Existing benchmarks mostly adopt a \textbf{character-centric} approach, simplify user-character interactions to isolated Q&A tasks, and fail to reflect real-world applications. To address this limitation, we introduce RMTBench, a comprehensive \textbf{user-centric} bilingual role-playing benchmark featuring 80 diverse characters and over 8,000 dialogue rounds. RMTBench includes custom characters with detailed backgrounds and abstract characters defined by simple traits, enabling evaluation across various user scenarios. Our benchmark constructs dialogues based on explicit user motivations rather than character descriptions, ensuring alignment with practical user applications. Furthermore, we construct an authentic multi-turn dialogue simulation mechanism. With carefully selected evaluation dimensions and LLM-based scoring, this mechanism captures the complex intention of conversations between the user and the character. By shifting focus from character background to user intention fulfillment, RMTBench bridges the gap between academic evaluation and practical deployment requirements, offering a more effective framework for assessing role-playing capabilities in LLMs. All code and datasets will be released soon. We release the datasets at https://huggingface.co/datasets/xiangh/RMTBENCH.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (7 more...)
- Research Report (0.50)
- Overview (0.46)
FABRIC: Framework for Agent-Based Realistic Intelligence Creation
Verma, Abhigya, Subramanian, Seganrasan, Kandasamy, Nandhakumar, Gupta, Naman
Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction records that couple user intents with tool specifications, argument-grounded calls, and verifiable execution traces. However, collecting such data from human annotators is costly, time-consuming, and difficult to scale. We present a unified framework for synthesizing agentic data using only LLMs, without any human-in-the-loop supervision. This framework decomposes generation into modular pipelines that produce complete interaction records spanning task specifications, tool definitions, policy pseudocode, natural language exchanges, and execution traces. Records conform to strict syntactic and semantic constraints, ensuring machine-parseability and faithful alignment across inputs, outputs, and tool calls. Beyond single tasks, there is support for both multi-task and multi-turn agent interactions, enabling the construction of datasets that reflect the full spectrum of tool-use competencies. To ensure quality and consistency, the framework integrates constrained generation formats, JSON-schema validation, and judge-based filtering. This paper formalizes the schema for agentic records, details the prompt design principles that guide generation, and introduces scalable pipelines for high-quality synthetic data. By providing a reproducible, LLM-only alternative to manual collection, hence advancing the development of agentic LLMs capable of robust tool use.
- North America > United States > California (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > New York (0.04)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Law (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology (0.67)
MAFA: A multi-agent framework for annotation
Hegazy, Mahmood, Rodrigues, Aaron, Naeem, Azzam
Modern consumer banking applications require accurate and efficient retrieval of information in response to user queries. Mapping user utterances to the most relevant Frequently Asked Questions (FAQs) is a crucial component of these systems. Traditional approaches often rely on a single model or technique, which may not capture the nuances of diverse user inquiries. In this paper, we introduce a multi-agent framework for FAQ annotation that combines multiple specialized agents with different approaches and a judge agent that reranks candidates to produce optimal results. Our agents utilize a structured reasoning approach inspired by Attentive Reasoning Queries (ARQs), which guides them through systematic reasoning steps using targeted, task-specific JSON queries. Our framework features a few-shot example strategy, where each agent receives different few-shots, enhancing ensemble diversity and coverage of the query space. We evaluate our framework on a real-world major bank dataset as well as public benchmark datasets (LCQMC and FiQA), demonstrating significant improvements over single-agent approaches across multiple metrics, including a 14% increase in Top-1 accuracy, an 18% increase in Top-5 accuracy, and a 12% improvement in Mean Reciprocal Rank on our dataset, and similar gains on public benchmarks when compared with traditional and single-agent annotation techniques. Our framework is particularly effective at handling ambiguous queries, making it well-suited for deployment in production banking applications while showing strong generalization capabilities across different domains and languages.
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (2 more...)
- Frequently Asked Questions (FAQ) (1.00)
- Research Report (0.64)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.46)
Flipping the Dialogue: Training and Evaluating User Language Models
Naous, Tarek, Laban, Philippe, Xu, Wei, Neville, Jennifer
Conversations with LMs involve two participants: a human user leading the conversation, and an LM assistant responding to the user's request. To satisfy this specific role, LMs are post-trained to be helpful assistants -- optimized to produce exhaustive and well-structured responses, free of ambiguity and grammar errors. User utterances, on the other hand, are rarely perfected, with each user phrasing requests in unique ways, sometimes putting in partial effort at each turn and refining on the fly. To evaluate LM performance in realistic settings, prior work simulated users in multi-turn conversations, often prompting an LLM originally trained to be a helpful assistant to act as a user. However, we show that assistant LMs make for poor user simulators, with the surprising finding that better assistants yield worse simulators. Instead, we introduce purpose-built User Language Models (User LMs) - models post-trained to simulate human users in multi-turn conversations. Through various evaluations, we show how User LMs align better with human behavior and achieve better simulation robustness than existing simulation methods. When leveraging User LMs to simulate coding and math conversations, the performance of a strong assistant (GPT-4o) drops from 74.6% to 57.4%, confirming that more realistic simulation environments lead to assistant struggles as they fail to cope with the nuances of users in multi-turn setups.
- North America > United States (0.14)
- Africa > Middle East > Algeria (0.04)
- Health & Medicine (0.69)
- Leisure & Entertainment > Games > Computer Games (0.34)
Supplementary Materials A Appendix 1 A.1 Construction & Schema Details 2 A.1.1 Conversation Details 3
The hotel
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Africa > South Africa (0.04)
- North America > Canada (0.04)
- (4 more...)
- Consumer Products & Services > Restaurants (1.00)
- Consumer Products & Services > Hotels (0.96)
- Transportation > Ground > Road (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Africa > South Africa (0.04)
- (13 more...)
- Overview (0.46)
- Research Report > New Finding (0.46)
- Law (0.93)
- Consumer Products & Services > Restaurants (0.68)
- Information Technology (0.68)
- Transportation > Ground > Road (0.46)