Goto

Collaborating Authors

 Li, Jingyuan


Enhancing Object Detection Accuracy in Underwater Sonar Images through Deep Learning-based Denoising

arXiv.org Artificial Intelligence

Xidian University, China Xidian University, China Jiangxi University of Science and Technology, China Institute of Deep-sea Science and Engineering, China Abstract --Sonar image object detection is crucial for underwater robotics and other applications. However, various types of noise in sonar images can affect the accuracy of object detection. Denoising, as a critical preprocessing step, aims to remove noise while retaining useful information to improve detection accuracy. Although deep learning-based denoising algorithms perform well on optical images, their application to underwater sonar images remains underexplored. This paper systematically evaluates the effectiveness of several deep learning-based denoising algorithms, originally designed for optical images, in the context of underwater sonar image object detection. We apply nine trained denoising models to images from five open-source sonar datasets, each processing different types of noise. We then test the denoised images using four object detection algorithms. The results show that different denoising models have varying effects on detection performance. By combining the strengths of multiple denoising models, the detection results can be optimized, thus more effectively suppressing noise. Additionally, we adopt a multi-frame denoising technique, using different outputs generated by multiple denoising models as multiple frames of the same scene for further processing to enhance detection accuracy. This method, originally designed for optical images, leverages complementary noise-reduction effects. Experimental results show that denoised sonar images improve the performance of object detection algorithms compared to the original sonar images. I NTRODUCTION Underwater sonar imaging plays an indispensable role in marine exploration and various ocean industries, providing valuable insights into underwater environments. Unlike optical imaging, where light propagation is restricted, sonar systems utilize sound waves that travel farther, allowing them to cover larger underwater areas. This makes sonar images an ideal choice for applications such as seabed mapping, underwater object detection, and navigation. However, despite the advantages of sonar imaging, its image quality is often severely compromised by noise, which negatively impacts the accuracy of downstream tasks, such as object detection. In sonar images, noise can originate from various factors, including environmental interference, sensor imperfections, and the inherent characteristics of sound wave propagation Corresponding authors: Tao Xue, Y anbin Wang. in water. Common types of sonar image noise include Gaussian noise, speckle noise, and Poisson noise. Gaussian noise typically arises from random fluctuations in sensor readings or environmental changes. Speckle noise, caused by sound wave scattering, manifests as granular interference, which can obscure object boundaries.


COMET:Combined Matrix for Elucidating Targets

arXiv.org Artificial Intelligence

Identifying the interaction targets of bioactive compounds is a foundational element for deciphering their pharmacological effects. Target prediction algorithms equip researchers with an effective tool to rapidly scope and explore potential targets. Here, we introduce the COMET, a multi-technological modular target prediction tool that provides comprehensive predictive insights, including similar active compounds, three-dimensional predicted binding modes, and probability scores, all within an average processing time of less than 10 minutes per task. With meticulously curated data, the COMET database encompasses 990,944 drug-target interaction pairs and 45,035 binding pockets, enabling predictions for 2,685 targets, which span confirmed and exploratory therapeutic targets for human diseases. In comparative testing using datasets from ChEMBL and BindingDB, COMET outperformed five other well-known algorithms, offering nearly an 80% probability of accurately identifying at least one true target within the top 15 predictions for a given compound. COMET also features a user-friendly web server, accessible freely at https://www.pdbbind-plus.org.cn/comet.


Brain-to-Text Benchmark '24: Lessons Learned

arXiv.org Artificial Intelligence

Speech brain-computer interfaces aim to decipher what a person is trying to say from neural activity alone, restoring communication to people with paralysis who have lost the ability to speak intelligibly. The Brain-to-Text Benchmark '24 and associated competition was created to foster the advancement of decoding algorithms that convert neural activity to text. Here, we summarize the lessons learned from the competition ending on June 1, 2024 (the top 4 entrants also presented their experiences in a recorded webinar). The largest improvements in accuracy were achieved using an ensembling approach, where the output of multiple independent decoders was merged using a fine-tuned large language model (an approach used by all 3 top entrants). Performance gains were also found by improving how the baseline recurrent neural network (RNN) model was trained, including by optimizing learning rate scheduling and by using a diphone training objective. Improving upon the model architecture itself proved more difficult, however, with attempts to use deep state space models or transformers not yet appearing to offer a benefit over the RNN baseline. The benchmark will remain open indefinitely to support further work towards increasing the accuracy of brain-to-text algorithms.


Feynman-Kac Operator Expectation Estimator

arXiv.org Machine Learning

The Feynman-Kac Operator Expectation Estimator (FKEE) is an innovative method for estimating the target Mathematical Expectation $\mathbb{E}_{X\sim P}[f(X)]$ without relying on a large number of samples, in contrast to the commonly used Markov Chain Monte Carlo (MCMC) Expectation Estimator. FKEE comprises diffusion bridge models and approximation of the Feynman-Kac operator. The key idea is to use the solution to the Feynmann-Kac equation at the initial time $u(x_0,0)=\mathbb{E}[f(X_T)|X_0=x_0]$. We use Physically Informed Neural Networks (PINN) to approximate the Feynman-Kac operator, which enables the incorporation of diffusion bridge models into the expectation estimator and significantly improves the efficiency of using data while substantially reducing the variance. Diffusion Bridge Model is a more general MCMC method. In order to incorporate extensive MCMC algorithms, we propose a new diffusion bridge model based on the Minimum Wasserstein distance. This diffusion bridge model is universal and reduces the training time of the PINN. FKEE also reduces the adverse impact of the curse of dimensionality and weakens the assumptions on the distribution of $X$ and performance function $f$ in the general MCMC expectation estimator. The theoretical properties of this universal diffusion bridge model are also shown. Finally, we demonstrate the advantages and potential applications of this method through various concrete experiments, including the challenging task of approximating the partition function in the random graph model such as the Ising model.


Event-Based Eye Tracking. AIS 2024 Challenge Survey

arXiv.org Artificial Intelligence

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.


LSDH: A Hashing Approach for Large-Scale Link Prediction in Microblogs

AAAI Conferences

One challenge of link prediction in online social networks is the large scale of many such networks. The measures used by existing work lack a computational consideration in the large scale setting. We propose the notion of social distance in a multi-dimensional form to measure the closeness among a group of people in Microblogs. We proposed a fast hashing approach called Locality-sensitive Social Distance Hashing (LSDH), which works in an unsupervised setup and performs approximate near neighbor search without high-dimensional distance computation. Experiments were applied over a Twitter dataset and the preliminary results testified the effectiveness of LSDH in predicting the likelihood of future associations between people.