AITopics | error propagation

Collaborating Authors

error propagation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Here we present the proofs for the arguments behind each of the approximations described in Section4.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation

Lin, Zexin, Wan, Hawen, Zhong, Yebin, Xiaoqiang, null

arXiv.org Artificial IntelligenceDec-4-2025

Vision-Language Models (VLMs) deployed in safety-critical applications such as autonomous driving must handle continuous visual streams under imperfect conditions. However, existing benchmarks focus on static, high-quality images and ignore temporal degradation and error propagation, which are critical failure modes where transient visual corruption induces hallucinations that persist across subsequent frames. We introduce DIQ-H, the first benchmark for evaluating VLM robustness under dynamic visual degradation in temporal sequences. DIQ-H applies physics-based corruptions including motion blur, sensor noise, and compression artifacts, and measures hallucination persistence, error recovery, and temporal consistency through multi-turn question-answering tasks. To enable scalable annotation, we propose Uncertainty-Guided Iterative Refinement (UIR), which generates reliable pseudo-ground-truth using lightweight VLMs with uncertainty filtering, achieving a 15.3 percent accuracy improvement. Experiments on 16 state-of-the-art VLMs reveal substantial robustness gaps: even advanced models such as GPT-4o achieve only a 78.5 percent recovery rate, while open-source models struggle with temporal consistency at less than 60 percent. DIQ-H provides a comprehensive platform for evaluating VLM reliability in real-world deployments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.03992

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
(2 more...)

Add feedback

MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

Peng, Yuezhang, Cai, Chonghao, Liu, Ziang, Fan, Shuai, Jiang, Sheng, Xu, Hua, Liu, Yuxin, Chen, Qiguang, Xu, Kele, Li, Yao, Wang, Sheng, Qin, Libo, Chen, Xie

arXiv.org Artificial IntelligenceDec-2-2025

ABSTRACT Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context learning, their performance still lags significantly behind SFT. Meanwhile, E2E LALMs demonstrate performance comparable to pipeline approaches and effectively avoid error propagation from speech recognition.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.01603

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization

Neural Information Processing SystemsOct-10-2025, 08:29:02 GMT

The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions.

adaptation, classifier, domain adaptation, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Instance Relation Learning Network with Label Knowledge Propagation for Few-shot Multi-label Intent Detection

Zhao, Shiman, Li, Shangyuan, Chen, Wei, Wang, Tengjiao, Yao, Jiahui, Zheng, Jiabin, Wong, Kam Fai

arXiv.org Artificial IntelligenceOct-10-2025

Few-shot Multi-label Intent Detection (MID) is crucial for dialogue systems, aiming to detect multiple intents of utterances in low-resource dialogue domains. Previous studies focus on a two-stage pipeline. They first learn representations of utterances with multiple labels and then use a threshold-based strategy to identify multi-label results. However, these methods rely on representation classification and ignore instance relations, leading to error propagation. To solve the above issues, we propose a multi-label joint learning method for few-shot MID in an end-to-end manner, which constructs an instance relation learning network with label knowledge propagation to eliminate error propagation. Concretely, we learn the interaction relations between instances with class information to propagate label knowledge between a few labeled (support set) and unlabeled (query set) instances. With label knowledge propagation, the relation strength between instances directly indicates whether two utterances belong to the same intent for multi-label prediction. Besides, a dual relation-enhanced loss is developed to optimize support- and query-level relation strength to improve performance. Experiments show that we outperform strong baselines by an average of 9.54% AUC and 11.19% Macro-F1 in 1-shot scenarios.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.07776

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation

Romero-Díaz, Jacobo, Gállego, Gerard I., Pareras, Oriol, Costa, Federico, Hernando, Javier, España-Bonet, Cristina

arXiv.org Artificial IntelligenceOct-6-2025

Speech-to-Text Translation (S2TT) systems built from Automatic Speech Recognition (ASR) and Text-to-Text Translation (T2TT) modules face two major limitations: error propagation and the inability to exploit prosodic or other acoustic cues. Chain-of-Thought (CoT) prompting has recently been introduced, with the expectation that jointly accessing speech and transcription will overcome these issues. Analyzing CoT through attribution methods, robustness evaluations with corrupted transcripts, and prosody-awareness, we find that it largely mirrors cascaded behavior, relying mainly on transcripts while barely leveraging speech. Simple training interventions, such as adding Direct S2TT data or noisy transcript injection, enhance robustness and increase speech attribution. These findings challenge the assumed advantages of CoT and highlight the need for architectures that explicitly integrate acoustic information into translation.

artificial intelligence, machine learning, translation, (15 more...)

arXiv.org Artificial Intelligence

2510.03115

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology: