AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions

Park, Hansol, Ahn, Hoseong, Moon, Junwon, Lee, Yejin, Shim, Kyuhong

arXiv.org Artificial IntelligenceOct-13-2025

Hallucinations in vision-language models have been extensively studied using benchmarks that probe reliability in image-text settings. In contrast, the effect of spoken queries on multimodal hallucinations remains largely unexplored, despite the growing role of voice-driven interfaces. In this work, we investigate how spoken input influences hallucinations in multimodal large language models. We present RePOPE-Spk, an audio-augmented extension of the RePOPE benchmark, where queries are provided as speech under diverse acoustic conditions. Using RePOPE-Spk, we systematically evaluate both proprietary and open-source models. Experimental results show that hallucinations escalate when queries are spoken rather than written: error rates increase by 3% under clean speech and by up to 20% with environmental noise. Input order and query length further affect robustness, while strategies such as many-shot prompting and chain-of-thought reasoning offer partial but insufficient mitigation. These findings highlight a critical and underexplored challenge, opening new directions for building reliable voice interface systems.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.08581

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Scalable multilingual PII annotation for responsible AI in LLMs

Meena, Bharti, Skubisz, Joanna, Rajgarhia, Harshit, Dave, Nand, Ganesh, Kiran, Dalmia, Shivali, Mukherji, Abhishek, Sundarababu, Vasudevan

arXiv.org Artificial IntelligenceOct-13-2025

Abstract--As Large Language Models (LLMs) gain wider adoption, ensuring their reliable handling of Personally Identifiable Information (PII) across diverse regulatory contexts has become essential. This work introduces a scalable multilingual data curation framework designed for high-quality PII annotation across 13 underrepresented locales (Table I), covering approximately 336 locale-specific PII types. Our phased, human-in-the-loop annotation methodology combines linguistic expertise with rigorous quality assurance, leading to substantial improvements in recall and false positive rates from pilot, training, and production phases. Beyond reporting empirical gains, we highlight common annotator challenges in multilingual PII labeling and demonstrate how iterative, analytics-driven pipelines can enhance both annotation quality and downstream model reliability. I. Introduction A. PII Data Protection The surge in user-generated content has led to vast textual corpora containing hidden instances of Personally Identifiable Information (PII) in application forms, support tickets, reviews and social media posts [1]. PII--such as NAME, SSN, and PHONE NUMBER--poses significant privacy risks if not handled correctly. Its compromise can result in identity theft, financial fraud, and unauthorized access to sensitive data [2].

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.0625

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

Trestman, Morris, Gugler, Stefan, Faber, Felix A., von Lilienfeld, O. A.

arXiv.org Machine LearningOct-13-2025

Smart training set selections procedures enable the reduction of data needs and improves predictive robustness in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy-system (Styblinski-Tang function) as well as for molecular dynamics trajectories from the MD17 dataset. Compared to FPS and uniform sampling, our numerical results indicate superior data efficiency and robustness when using GGFPS. Distribution analysis of the MD17 data suggests that FPS systematically under-samples equilibrium geometries, resulting in large test errors for relaxed structures. GGFPS cures this artifact and (i) enables up to two fold reductions in training cost without sacrificing predictive accuracy compared to FPS in the 2-dimensional Styblinksi-Tang system, (ii) systematically lowers prediction errors for equilibrium as well as strained structures in MD17, and (iii) systematically decreases prediction error variances across all of the MD17 configuration spaces. These results suggest that gradient-aware sampling methods hold great promise as effective training set selection tools, and that naive use of FPS may result in imbalanced training and inconsistent prediction outcomes.

artificial intelligence, configuration, machine learning, (15 more...)

arXiv.org Machine Learning

2510.08906

Country:

Europe (0.46)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Materials (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

ffee3090eac0aae698b2d77ac5642c2c-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:48:47 GMT

algorithm, hypothesis, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

f6de5ed486b1802874fe9c70b50277e4-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-11-2025, 00:47:55 GMT

accuracy, curation strategy, dataset, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning Pusen Dong

Neural Information Processing SystemsOct-11-2025, 00:46:02 GMT

Safe reinforcement learning (RL) requires the agent to finish a given task while obeying specific constraints.

constraint, textual constraint, trajectory, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.67)
Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion

Neural Information Processing SystemsOct-11-2025, 00:37:37 GMT

The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.

dataset, detection, diffusionfake, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

b4fd162d3e2d015233486a2e313828a7-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:37:22 GMT

accuracy, class distribution, learning, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Add feedback

Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation

Neural Information Processing SystemsOct-11-2025, 00:33:23 GMT

Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of knowledge from a label-rich source graph to an unlabeled target graph under domain discrepancies. Despite the proliferation of methods designed for this emerging task, the lack of standard experimental settings and fair performance comparisons makes it challenging to understand which and when models perform well across different scenarios.

dataset, distribution shift, domain adaptation, (15 more...)

Neural Information Processing Systems

Country: