Goto

Collaborating Authors

 visual degradation


DIQ-H: Evaluating Hallucination Persistence in VLMs Under Temporal Visual Degradation

Lin, Zexin, Wan, Hawen, Zhong, Yebin, Xiaoqiang, null

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) deployed in safety-critical applications such as autonomous driving must handle continuous visual streams under imperfect conditions. However, existing benchmarks focus on static, high-quality images and ignore temporal degradation and error propagation, which are critical failure modes where transient visual corruption induces hallucinations that persist across subsequent frames. We introduce DIQ-H, the first benchmark for evaluating VLM robustness under dynamic visual degradation in temporal sequences. DIQ-H applies physics-based corruptions including motion blur, sensor noise, and compression artifacts, and measures hallucination persistence, error recovery, and temporal consistency through multi-turn question-answering tasks. To enable scalable annotation, we propose Uncertainty-Guided Iterative Refinement (UIR), which generates reliable pseudo-ground-truth using lightweight VLMs with uncertainty filtering, achieving a 15.3 percent accuracy improvement. Experiments on 16 state-of-the-art VLMs reveal substantial robustness gaps: even advanced models such as GPT-4o achieve only a 78.5 percent recovery rate, while open-source models struggle with temporal consistency at less than 60 percent. DIQ-H provides a comprehensive platform for evaluating VLM reliability in real-world deployments.


Underwater Visual-Inertial-Acoustic-Depth SLAM with DVL Preintegration for Degraded Environments

Ding, Shuoshuo, Zhang, Tiedong, Jiang, Dapeng, Lei, Ming

arXiv.org Artificial Intelligence

Abstract--Visual degradation caused by limited visibility, insufficient lighting, and feature scarcity in underwater environments presents significant challenges to visual-inertial simultaneous localization and mapping (SLAM) systems. The key innovation lies in the tight integration of four distinct sensor modalities to ensure reliable operation, even under degraded visual conditions. To mitigate DVL drift and improve measurement efficiency, we propose a novel velocity-bias-based DVL preintegration strategy. At the frontend, hybrid tracking strategies and acoustic-inertial-depth joint optimization enhance system stability. Additionally, multi-source hybrid residuals are incorporated into a graph optimization framework. Extensive quantitative and qualitative analyses of the proposed system are conducted in both simulated and real-world underwater scenarios. The results demonstrate that our approach outperforms current state-of-the-art stereo visual-inertial SLAM systems in both stability and localization accuracy, exhibiting exceptional robustness, particularly in visually challenging environments. UMAN activities in the fields of ocean engineering and marine science are increasing steadily, encompassing scientific expeditions to study underwater hydrothermal vents and archaeological sites, inspections and maintenance of subsea pipelines and reservoirs, and salvage operations for wrecked aircraft and vessels. Shuoshuo Ding, Tiedong Zhang and Dapeng Jiang are with School of Ocean Engineering and T echnology & Southern Marine science and Engineering Guangdong Laboratory (Zhuhai), Sun Y at-sen University, Zhuhai 519082, China, with Guangdong Provincial Key Laboratory of Information T echnology for Deep Water Acoustics, Zhuhai 519082, China, and also with Key Laboratory of Comprehensive Observation of Polar Environment (Sun Y at-sen University), Ministry of Education, Zhuhai 519082, China (e-mail: dingshsh5@mail2.sysu.edu.cn,


RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation

Pan, Shu, Hong, Ziyang, Hu, Zhangrui, Xu, Xiandong, Lu, Wenjie, Hu, Liang

arXiv.org Artificial Intelligence

Visual degradation in underwater environments poses unique and significant challenges, which distinguishes underwater SLAM from popular vision-based SLAM on the ground. In this paper, we propose RUSSO, a robust underwater SLAM system which fuses stereo camera, inertial measurement unit (IMU), and imaging sonar to achieve robust and accurate localization in challenging underwater environments for 6 degrees of freedom (DoF) estimation. During visual degradation, the system is reduced to a sonar-inertial system estimating 3-DoF poses. The sonar pose estimation serves as a strong prior for IMU propagation, thereby enhancing the reliability of pose estimation with IMU propagation. Additionally, we propose a SLAM initialization method that leverages the imaging sonar to counteract the lack of visual features during the initialization stage of SLAM. We extensively validate RUSSO through experiments in simulator, pool, and sea scenarios. The results demonstrate that RUSSO achieves better robustness and localization accuracy compared to the state-of-the-art visual-inertial SLAM systems, especially in visually challenging scenarios. To the best of our knowledge, this is the first time fusing stereo camera, IMU, and imaging sonar to realize robust underwater SLAM against visual degradation.