A Multimodal Conversational Agent for Tabular Data Analysis
Awad, Mohammad Nour Al, Ivanov, Sergey, Tikhonova, Olga, Khodnenko, Ivan
–arXiv.org Artificial Intelligence
Abstract--Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including voice interaction, while maintaining high performance. The system lets users query datasets with voice or text instructions and receive answers as plots, tables, statistics, or spoken explanations. Built on LLMs, the suggested design combines OpenAI Whisper automatic speech recognition (ASR) system, Qwen-coder code generation LLM/model, custom sandboxed execution tools, and Coqui library for text-to-speech (TTS) within an agentic orchestration loop. Unlike text-only analysis tools, it adapts responses across modalities and supports multi-turn dialogues grounded in dataset context. In an evaluation of 48 tasks on three datasets, our prototype achieved 95.8% accuracy with model-only generation time under 1.7 seconds (excluding ASR and execution time). A comparison across five LLM sizes (1.5B-32B) revealed accuracy-latency-cost trade-offs, with a 7B model providing the best balance for interactive use. By routing between conversation with user and code execution, constrained to a transparent sandbox, with simultaneously grounding prompts in schema-level context, the T alk2Data agent reliably retrieves actionable insights from tables while making computations verifiable. In the article, except for the T alk2Data agent itself, we discuss implications for human-data interaction, trust in LLM-driven analytics, and future extensions toward large-scale multimodal assistants. Interacting with data often requires programming skills or statistical expertise, creating barriers for managers, analysts, and other non-technical users [1], [2]. Natural language interfaces (NLIs) aim to improve this information seeking process by allowing users to query data conversationally [3], [4]. At the same time, voice interfaces are becoming increasingly common in daily life, yet existing voice assistants remain limited: they can answer factual questions or control devices, but they lack the analytical capabilities needed for meaningful data exploration.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia > Russia (0.14)
- Europe > Russia
- North America > United States
- Texas > Andrews County (0.04)
- Genre:
- Research Report (0.51)
- Industry:
- Information Technology > Security & Privacy (0.68)
- Technology: