Goto

Collaborating Authors

 onference


Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance

arXiv.org Artificial Intelligence

Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.


Bias Mitigation for AI-Feedback Loops in Recommender Systems: A Systematic Literature Review and Taxonomy

arXiv.org Artificial Intelligence

Recommender systems continually retrain on user reactions to their own predictions, creating AI feedback loops that amplify biases and diminish fairness over time. Despite this well-known risk, most bias mitigation techniques are tested only on static splits, so their long-term fairness across multiple retraining rounds remains unclear. We therefore present a systematic literature review of bias mitigation methods that explicitly consider AI feedback loops and are validated in multi-round simulations or live A/B tests. Screening 347 papers yields 24 primary studies published between 2019-2025. Each study is coded on six dimensions: mitigation technique, biases addressed, dynamic testing set-up, evaluation focus, application domain, and ML task, organising them into a reusable taxonomy. The taxonomy offers industry practitioners a quick checklist for selecting robust methods and gives researchers a clear roadmap to the field's most urgent gaps. Examples include the shortage of shared simulators, varying evaluation metrics, and the fact that most studies report either fairness or performance; only six use both.


Infinite Leagues Under the Sea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models

arXiv.org Artificial Intelligence

This paper tackles the problem of generating representations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we introduce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas, but are prone to noise and artifacts from the real world. We extract 3D geometry and semantics from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal distribution-based latent embeddings. We then fuse the generated images into a 3D map, building a 3DGS model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demonstrating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in multiple domains, spanning filming, gaming, and robot simulation.


SwarmGPT-Primitive: A Language-Driven Choreographer for Drone Swarms Using Safe Motion Primitive Composition

arXiv.org Artificial Intelligence

Catalyzed by advancements in hardware and software, drone performances are increasingly making their mark in the entertainment industry. However, designing smooth and safe choreographies for drone swarms is complex and often requires expert domain knowledge. In this work, we introduce SwarmGPT-Primitive, a language-based choreographer that integrates the reasoning capabilities of large language models (LLMs) with safe motion planning to facilitate deployable drone swarm choreographies. The LLM composes choreographies for a given piece of music by utilizing a library of motion primitives; the language-based choreographer is augmented with an optimization-based safety filter, which certifies the choreography for real-world deployment by making minimal adjustments when feasibility and safety constraints are violated. The overall SwarmGPT-Primitive framework decouples choreographic design from safe motion planning, which allows non-expert users to re-prompt and refine compositions without concerns about compliance with constraints such as avoiding collisions or downwash effects or satisfying actuation limits. We demonstrate our approach through simulations and experiments with swarms of up to 20 drones performing choreographies designed based on various songs, highlighting the system's ability to generate effective and synchronized drone choreographies for real-world deployment.


Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

arXiv.org Artificial Intelligence

With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.


When SparseMoE Meets Noisy Interactions: An Ensemble View on Denoising Recommendation

arXiv.org Artificial Intelligence

Learning user preferences from implicit feedback is one of the core challenges in recommendation. The difficulty lies in the potential noise within implicit feedback. Therefore, various denoising recommendation methods have been proposed recently. However, most of them overly rely on the hyperparameter configurations, inevitably leading to inadequacies in model adaptability and generalization performance. In this study, we propose a novel Adaptive Ensemble Learning (AEL) for denoising recommendation, which employs a sparse gating network as a brain, selecting suitable experts to synthesize appropriate denoising capacities for different data samples. To address the ensemble learning shortcoming of model complexity and ensure sub-recommender diversity, we also proposed a novel method that stacks components to create sub-recommenders instead of directly constructing them. Extensive experiments across various datasets demonstrate that AEL outperforms others in kinds of popular metrics, even in the presence of substantial and dynamic noise. Our code is available at https://github.com/cpu9xx/AEL.


Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures

arXiv.org Artificial Intelligence

The emergence of deep learning models has revolutionized various industries over the last decade, leading to a surge in connected devices and infrastructures. However, these models can be tricked into making incorrect predictions with high confidence, leading to disastrous failures and security concerns. To this end, we explore the impact of adversarial attacks on multivariate time-series forecasting and investigate methods to counter them. Specifically, we employ untargeted white-box attacks, namely the Fast Gradient Sign Method (FGSM) and the Basic Iterative Method (BIM), to poison the inputs to the training process, effectively misleading the model. We also illustrate the subtle modifications to the inputs after the attack, which makes detecting the attack using the naked eye quite difficult. Having demonstrated the feasibility of these attacks, we develop robust models through adversarial training and model hardening. We are among the first to showcase the transferability of these attacks and defenses by extrapolating our work from the benchmark electricity data to a larger, 10-year real-world data used for predicting the time-to-failure of hard disks. Our experimental results confirm that the attacks and defenses achieve the desired security thresholds, leading to a 72.41% and 94.81% decrease in RMSE for the electricity and hard disk datasets respectively after implementing the adversarial defenses.


On a Scale-Invariant Approach to Bundle Recommendations in Candy Crush Saga

arXiv.org Artificial Intelligence

A good understanding of player preferences is crucial for increasing content relevancy, especially in mobile games. This paper illustrates the use of attentive models for producing item recommendations in a mobile game scenario. The methodology comprises a combination of supervised and unsupervised approaches to create user-level recommendations while introducing a novel scale-invariant approach to the prediction. The methodology is subsequently applied to a bundle recommendation in Candy Crush Saga. The strategy of deployment, maintenance, and monitoring of ML models that are scaled up to serve millions of users is presented, along with the best practices and design patterns adopted to minimize technical debt typical of ML systems. The recommendation approach is evaluated both offline and online, with a focus on understanding the increase in engagement, click- and take rates, novelty effects, recommendation diversity, and the impact of degenerate feedback loops. We have demonstrated that the recommendation enhances user engagement by 30% concerning click rate and by more than 40% concerning take rate. In addition, we empirically quantify the diminishing effects of recommendation accuracy on user engagement.


Cluster-Wide Task Slowdown Detection in Cloud System

arXiv.org Artificial Intelligence

Slow task detection is a critical problem in cloud operation and maintenance since it is highly related to user experience and can bring substantial liquidated damages. Most anomaly detection methods detect it from a single-task aspect. However, considering millions of concurrent tasks in large-scale cloud computing clusters, it becomes impractical and inefficient. Moreover, single-task slowdowns are very common and do not necessarily indicate a malfunction of a cluster due to its violent fluctuation nature in a virtual environment. Thus, we shift our attention to cluster-wide task slowdowns by utilizing the duration time distribution of tasks across a cluster, so that the computation complexity is not relevant to the number of tasks. The task duration time distribution often exhibits compound periodicity and local exceptional fluctuations over time. Though transformer-based methods are one of the most powerful methods to capture these time series normal variation patterns, we empirically find and theoretically explain the flaw of the standard attention mechanism in reconstructing subperiods with low amplitude when dealing with compound periodicity. To tackle these challenges, we propose SORN (i.e., Skimming Off subperiods in descending amplitude order and Reconstructing Non-slowing fluctuation), which consists of a Skimming Attention mechanism to reconstruct the compound periodicity and a Neural Optimal Transport module to distinguish cluster-wide slowdowns from other exceptional fluctuations. Furthermore, since anomalies in the training set are inevitable in a practical scenario, we propose a picky loss function, which adaptively assigns higher weights to reliable time slots in the training set. Extensive experiments demonstrate that SORN outperforms state-of-the-art methods on multiple real-world industrial datasets.


Continuous Test-time Domain Adaptation for Efficient Fault Detection under Evolving Operating Conditions

arXiv.org Artificial Intelligence

Fault detection is crucial in industrial systems to prevent failures and optimize performance by distinguishing abnormal from normal operating conditions. Data-driven methods have been gaining popularity for fault detection tasks as the amount of condition monitoring data from complex industrial systems increases. Despite these advances, early fault detection remains a challenge under real-world scenarios. The high variability of operating conditions and environments makes it difficult to collect comprehensive training datasets that can represent all possible operating conditions, especially in the early stages of system operation. Furthermore, these variations often evolve over time, potentially leading to entirely new data distributions in the future that were previously unseen. These challenges prevent direct knowledge transfer across different units and over time, leading to the distribution gap between training and testing data and inducing performance degradation of those methods in real-world scenarios. To overcome this, our work introduces a novel approach for continuous test-time domain adaptation. This enables early-stage robust anomaly detection by addressing domain shifts and limited data representativeness issues. We propose a Test-time domain Adaptation Anomaly Detection (TAAD) framework that separates input variables into system parameters and measurements, employing two domain adaptation modules to independently adapt to each input category. This method allows for effective adaptation to evolving operating conditions and is particularly beneficial in systems with scarce data. Our approach, tested on a real-world pump monitoring dataset, shows significant improvements over existing domain adaptation methods in fault detection, demonstrating enhanced accuracy and reliability.