Goto

Collaborating Authors

 Zhang, Haibin


Enhancing Object Detection Accuracy in Underwater Sonar Images through Deep Learning-based Denoising

arXiv.org Artificial Intelligence

Xidian University, China Xidian University, China Jiangxi University of Science and Technology, China Institute of Deep-sea Science and Engineering, China Abstract --Sonar image object detection is crucial for underwater robotics and other applications. However, various types of noise in sonar images can affect the accuracy of object detection. Denoising, as a critical preprocessing step, aims to remove noise while retaining useful information to improve detection accuracy. Although deep learning-based denoising algorithms perform well on optical images, their application to underwater sonar images remains underexplored. This paper systematically evaluates the effectiveness of several deep learning-based denoising algorithms, originally designed for optical images, in the context of underwater sonar image object detection. We apply nine trained denoising models to images from five open-source sonar datasets, each processing different types of noise. We then test the denoised images using four object detection algorithms. The results show that different denoising models have varying effects on detection performance. By combining the strengths of multiple denoising models, the detection results can be optimized, thus more effectively suppressing noise. Additionally, we adopt a multi-frame denoising technique, using different outputs generated by multiple denoising models as multiple frames of the same scene for further processing to enhance detection accuracy. This method, originally designed for optical images, leverages complementary noise-reduction effects. Experimental results show that denoised sonar images improve the performance of object detection algorithms compared to the original sonar images. I NTRODUCTION Underwater sonar imaging plays an indispensable role in marine exploration and various ocean industries, providing valuable insights into underwater environments. Unlike optical imaging, where light propagation is restricted, sonar systems utilize sound waves that travel farther, allowing them to cover larger underwater areas. This makes sonar images an ideal choice for applications such as seabed mapping, underwater object detection, and navigation. However, despite the advantages of sonar imaging, its image quality is often severely compromised by noise, which negatively impacts the accuracy of downstream tasks, such as object detection. In sonar images, noise can originate from various factors, including environmental interference, sensor imperfections, and the inherent characteristics of sound wave propagation Corresponding authors: Tao Xue, Y anbin Wang. in water. Common types of sonar image noise include Gaussian noise, speckle noise, and Poisson noise. Gaussian noise typically arises from random fluctuations in sensor readings or environmental changes. Speckle noise, caused by sound wave scattering, manifests as granular interference, which can obscure object boundaries.


Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

arXiv.org Artificial Intelligence

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.