Oceania
Wake-Informed 3D Path Planning for Autonomous Underwater Vehicles Using A* and Neural Network Approximations
Cooper-Baldock, Zachary, Turnock, Stephen, Sammut, Karl
Autonomous Underwater Vehicles (AUVs) encounter significant energy, control and navigation challenges in complex underwater environments, particularly during close-proximity operations, such as launch and recovery (LAR), where fluid interactions and wake effects present additional navigational and energy challenges. Traditional path planning methods fail to incorporate these detailed wake structures, resulting in increased energy consumption, reduced control stability, and heightened safety risks. This paper presents a novel wake-informed, 3D path planning approach that fully integrates localized wake effects and global currents into the planning algorithm. Two variants of the A* algorithm - a current-informed planner and a wake-informed planner - are created to assess its validity and two neural network models are then trained to approximate these planners for real-time applications. Both the A* planners and NN models are evaluated using important metrics such as energy expenditure, path length, and encounters with high-velocity and turbulent regions. The results demonstrate a wake-informed A* planner consistently achieves the lowest energy expenditure and minimizes encounters with high-velocity regions, reducing energy consumption by up to 11.3%. The neural network models are observed to offer computational speedup of 6 orders of magnitude, but exhibit 4.51 - 19.79% higher energy expenditures and 9.81 - 24.38% less optimal paths. These findings underscore the importance of incorporating detailed wake structures into traditional path planning algorithms and the benefits of neural network approximations to enhance energy efficiency and operational safety for AUVs in complex 3D domains.
AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
Alawode, Basit, Ganapathi, Iyyakutti Iyappan, Javed, Sajid, Werghi, Naoufel, Bennamoun, Mohammed, Mahmood, Arif
The preservation of aquatic biodiversity is critical in mitigating the effects of climate change. Aquatic scene understanding plays a pivotal role in aiding marine scientists in their decision-making processes. In this paper, we introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding. AquaticCLIP presents a new unsupervised learning framework that aligns images and texts in aquatic environments, enabling tasks such as segmentation, classification, detection, and object counting. By leveraging our large-scale underwater image-text paired dataset without the need for ground-truth annotations, our model enriches existing vision-language models in the aquatic domain. For this purpose, we construct a 2 million underwater image-text paired dataset using heterogeneous resources, including YouTube, Netflix, NatGeo, etc. To fine-tune AquaticCLIP, we propose a prompt-guided vision encoder that progressively aggregates patch features via learnable prompts, while a vision-guided mechanism enhances the language encoder by incorporating visual context. The model is optimized through a contrastive pretraining loss to align visual and textual modalities. AquaticCLIP achieves notable performance improvements in zero-shot settings across multiple underwater computer vision tasks, outperforming existing methods in both robustness and interpretability. Our model sets a new benchmark for vision-language applications in underwater environments. The code and dataset for AquaticCLIP are publicly available on GitHub at xxx.
Rational Gaussian wavelets and corresponding model driven neural networks
รmon, Attila Miklรณs, Fenech, Kristian, Kovรกcs, Pรฉter, Dรณzsa, Tamรกs
In this paper we consider the continuous wavelet transform using Gaussian wavelets multiplied by an appropriate rational term. The zeros and poles of this rational modifier act as free parameters and their choice highly influences the shape of the mother wavelet. This allows the proposed construction to approximate signals with complex morphology using only a few wavelet coefficients. We show that the proposed rational Gaussian wavelets are admissible and provide numerical approximations of the wavelet coefficients using variable projection operators. In addition, we show how the proposed variable projection based rational Gaussian wavelet transform can be used in neural networks to obtain a highly interpretable feature learning layer. We demonstrate the effectiveness of the proposed scheme through a biomedical application, namely, the detection of ventricular ectopic beats (VEBs) in real ECG measurements.
GRADIEND: Monosemantic Feature Learning within Neural Networks Applied to Gender Debiasing of Transformer Models
Drechsel, Jonathan, Herbold, Steffen
We hypothesize that these gradients AI systems frequently exhibit and amplify social biases, contain valuable information for identifying and modifying including gender bias, leading to harmful consequences gender-specific features. Our method aims to learn a in critical areas. This study introduces a novel encoderdecoder feature neuron that encodes gender information from the approach that leverages model gradients to input, i.e., model gradients. Unlike existing approaches learn a single monosemantic feature neuron encoding for extracting monosemantic features (e.g., Bricken et al. gender information. We show that our method can (2023)), our approach enables the learning of a feature neuron be used to debias transformer-based language models, with a desired, interpretable meaning, such as gender.
Learning to Learn Weight Generation via Trajectory Diffusion
Guan, Yunchuan, Liu, Yu, Zhou, Ke, Shen, Zhiqi, Belongie, Serge, Hwang, Jenq-Neng, Li, Lei
Diffusion-based algorithms have emerged as promising techniques for weight generation, particularly in scenarios like multi-task learning that require frequent weight updates. However, existing solutions suffer from limited cross-task transferability. In addition, they only utilize optimal weights as training samples, ignoring the value of other weights in the optimization process. To address these issues, we propose Lt-Di, which integrates the diffusion algorithm with meta-learning to generate weights for unseen tasks. Furthermore, we extend the vanilla diffusion algorithm into a trajectory diffusion algorithm to utilize other weights along the optimization trajectory. Trajectory diffusion decomposes the entire diffusion chain into multiple shorter ones, improving training and inference efficiency. We analyze the convergence properties of the weight generation paradigm and improve convergence efficiency without additional time overhead. Our experiments demonstrate Lt-Di's higher accuracy while reducing computational overhead across various tasks, including zero-shot and few-shot learning, multi-domain generalization, and large-scale language model fine-tuning.Our code is released at https://github.com/tuantuange/Lt-Di.
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents
Zhang, Zhizhen, Zhu, Lei, Fang, Zhen, Huang, Zi, Luo, Yadan
Pre-training vision-language representations on human action videos has emerged as a promising approach to reduce reliance on large-scale expert demonstrations for training embodied agents. However, prior methods often employ time contrastive learning based on goal-reaching heuristics, progressively aligning language instructions from the initial to the final frame. This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. AcTOL treats a video as a continuous trajectory where it (1) contrasts semantic differences between frames to reflect their natural ordering, and (2) imposes a local Brownian bridge constraint to ensure smooth transitions across intermediate frames. Extensive imitation learning experiments across varying numbers of demonstrations show that the pretrained features significantly enhance downstream manipulation tasks by up to 49% with high robustness to different linguistic styles of instructions, offering a viable pathway toward generalized embodied agents. The source code is included in the supplementary material for reference.
FireCastNet: Earth-as-a-Graph for Seasonal Fire Prediction
Michail, Dimitrios, Davalas, Charalampos, Panagiotou, Lefki-Ioanna, Prapas, Ioannis, Kondylatos, Spyros, Bountos, Nikolaos Ioannis, Papoutsis, Ioannis
With climate change expected to exacerbate fire weather conditions, the accurate and timely anticipation of wildfires becomes increasingly crucial for disaster mitigation. In this study, we utilize SeasFire, a comprehensive global wildfire dataset with climate, vegetation, oceanic indices, and human-related variables, to enable seasonal wildfire forecasting with machine learning. For the predictive analysis, we present FireCastNet, a novel architecture which combines a 3D convolutional encoder with GraphCast, originally developed for global short-term weather forecasting using graph neural networks. FireCastNet is trained to capture the context leading to wildfires, at different spatial and temporal scales. Our investigation focuses on assessing the effectiveness of our model in predicting the presence of burned areas at varying forecasting time horizons globally, extending up to six months into the future, and on how different spatial or/and temporal context affects the performance. Our findings demonstrate the potential of deep learning models in seasonal fire forecasting; longer input time-series leads to more robust predictions, while integrating spatial information to capture wildfire spatio-temporal dynamics boosts performance. Finally, our results hint that in order to enhance performance at longer forecasting horizons, a larger receptive field spatially needs to be considered.
A Comprehensive Study of Bug-Fix Patterns in Autonomous Driving Systems
Chen, Yuntianyi, Huai, Yuqi, He, Yirui, Li, Shilong, Hong, Changnam, Chen, Qi Alfred, Garcia, Joshua
As autonomous driving systems (ADSes) become increasingly complex and integral to daily life, the importance of understanding the nature and mitigation of software bugs in these systems has grown correspondingly. Addressing the challenges of software maintenance in autonomous driving systems (e.g., handling real-time system decisions and ensuring safety-critical reliability) is crucial due to the unique combination of real-time decision-making requirements and the high stakes of operational failures in ADSes. The potential of automated tools in this domain is promising, yet there remains a gap in our comprehension of the challenges faced and the strategies employed during manual debugging and repair of such systems. In this paper, we present an empirical study that investigates bug-fix patterns in ADSes, with the aim of improving reliability and safety. We have analyzed the commit histories and bug reports of two major autonomous driving projects, Apollo and Autoware, from 1,331 bug fixes with the study of bug symptoms, root causes, and bug-fix patterns. Our study reveals several dominant bug-fix patterns, including those related to path planning, data flow, and configuration management. Additionally, we find that the frequency distribution of bug-fix patterns varies significantly depending on their nature and types and that certain categories of bugs are recurrent and more challenging to exterminate. Based on our findings, we propose a hierarchy of ADS bugs and two taxonomies of 15 syntactic bug-fix patterns and 27 semantic bug-fix patterns that offer guidance for bug identification and resolution. We also contribute a benchmark of 1,331 ADS bug-fix instances.
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Guan, Xinyan, Zeng, Jiali, Meng, Fandong, Xin, Chunlei, Lu, Yaojie, Lin, Hongyu, Han, Xianpei, Sun, Le, Zhou, Jie
Large Language Models (LLMs) have shown remarkable potential in reasoning while they still suffer from severe factual hallucinations due to timeliness, accuracy, and coverage of parametric knowledge. Meanwhile, integrating reasoning with retrieval-augmented generation (RAG) remains challenging due to ineffective task decomposition and redundant retrieval, which can introduce noise and degrade response quality. In this paper, we propose DeepRAG, a framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP), enabling strategic and adaptive retrieval. By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step. Experiments show that DeepRAG improves retrieval efficiency while improving answer accuracy by 21.99%, demonstrating its effectiveness in optimizing retrieval-augmented reasoning.
Societal Attitudes Toward Service Robots: Adore, Abhor, Ignore, or Unsure?
Yoganathan, V., Osburg, V. -S., Colladon, A. Fronzetti, Charles, V., Toporowski, W.
Societal or population-level attitudes are aggregated patterns of different individual attitudes, representing collective general predispositions. As service robots become ubiquitous, understanding attitudes towards them at the population (vs. individual) level enables firms to expand robot services to a broad (vs. niche) market. Targeting population-level attitudes would benefit service firms because: (1) they are more persistent, thus, stronger predictors of behavioral patterns and (2) this approach is less reliant on personal data, whereas individualized services are vulnerable to AI-related privacy risks. As for service theory, ignoring broad unobserved differences in attitudes produces biased conclusions, and our systematic review of previous research highlights a poor understanding of potential heterogeneity in attitudes toward service robots. We present five diverse studies (S1-S5), utilizing multinational and "real world" data (Ntotal = 89,541; years: 2012-2024). Results reveal a stable structure comprising four distinct attitude profiles (S1-S5): positive ("adore"), negative ("abhor"), indifferent ("ignore"), and ambivalent ("unsure"). The psychological need for interacting with service staff, and for autonomy and relatedness in technology use, function as attitude profile antecedents (S2). Importantly, the attitude profiles predict differences in post-interaction discomfort and anxiety (S3), satisfaction ratings and service evaluations (S4), and perceived sociability and uncanniness based on a robot's humanlikeness (S5).