Goto

Collaborating Authors

 Overview


Bisimulation metric for Model Predictive Control

arXiv.org Artificial Intelligence

Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite, demonstrating superior performance and robustness compared to state-of-the-art baseline methods.


SafeLLM: Domain-Specific Safety Monitoring for Large Language Models: A Case Study of Offshore Wind Maintenance

arXiv.org Artificial Intelligence

The Offshore Wind (OSW) industry is experiencing significant expansion, resulting in increased Operations \& Maintenance (O\&M) costs. Intelligent alarm systems offer the prospect of swift detection of component failures and process anomalies, enabling timely and precise interventions that could yield reductions in resource expenditure, as well as scheduled and unscheduled downtime. This paper introduces an innovative approach to tackle this challenge by capitalising on Large Language Models (LLMs). We present a specialised conversational agent that incorporates statistical techniques to calculate distances between sentences for the detection and filtering of hallucinations and unsafe output. This potentially enables improved interpretation of alarm sequences and the generation of safer repair action recommendations by the agent. Preliminary findings are presented with the approach applied to ChatGPT-4 generated test sentences. The limitation of using ChatGPT-4 and the potential for enhancement of this agent through re-training with specialised OSW datasets are discussed.


Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated significant potential in clinical decision support. Yet LLMs still suffer from hallucinations and lack fine-grained contextual medical knowledge, limiting their high-stake healthcare applications such as clinical diagnosis. Traditional retrieval-augmented generation (RAG) methods attempt to address these limitations but frequently retrieve sparse or irrelevant information, undermining prediction accuracy. We introduce KARE, a novel framework that integrates knowledge graph (KG) community-level retrieval with LLM reasoning to enhance healthcare predictions. KARE constructs a comprehensive multi-source KG by integrating biomedical databases, clinical literature, and LLM-generated insights, and organizes it using hierarchical graph community detection and summarization for precise and contextually relevant information retrieval. Our key innovations include: (1) a dense medical knowledge structuring approach enabling accurate retrieval of relevant information; (2) a dynamic knowledge retrieval mechanism that enriches patient contexts with focused, multi-faceted medical insights; and (3) a reasoning-enhanced prediction framework that leverages these enriched contexts to produce both accurate and interpretable clinical predictions. Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions. In addition to its impressive prediction accuracy, our framework leverages the reasoning capabilities of LLMs, enhancing the trustworthiness of clinical predictions.


Machine Learning-Assisted Intrusion Detection for Enhancing Internet of Things Security

arXiv.org Artificial Intelligence

Attacks against the Internet of Things (IoT) are rising as devices, applications, and interactions become more networked and integrated. The increase in cyber-attacks that target IoT networks poses a considerable vulnerability and threat to the privacy, security, functionality, and availability of critical systems, which leads to operational disruptions, financial losses, identity thefts, and data breaches. To efficiently secure IoT devices, real-time detection of intrusion systems is critical, especially those using machine learning to identify threats and mitigate risks and vulnerabilities. This paper investigates the latest research on machine learning-based intrusion detection strategies for IoT security, concentrating on real-time responsiveness, detection accuracy, and algorithm efficiency. Key studies were reviewed from all well-known academic databases, and a taxonomy was provided for the existing approaches. This review also highlights existing research gaps and outlines the limitations of current IoT security frameworks to offer practical insights for future research directions and developments.


Advancements in Robotics Process Automation: A Novel Model with Enhanced Empirical Validation and Theoretical Insights

arXiv.org Artificial Intelligence

Abstract: Robotics Process Automation (RPA) is revolutionizing business operations by significantly enhancing efficiency, productivity, and operational excellence across various industries. This manuscript delivers a comprehensive review of recent advancements in RPA technologies and proposes a novel model designed to elevate RPA capabilities. Incorporating cutting-edge artificial intelligence (AI) techniques, advanced machine learning algorithms, and strategic integration frameworks, the proposed model aims to push RPA's boundaries. The paper includes a detailed analysis of functionalities, implementation strategies, and expanded empirical validation through rigorous testing across multiple industries. Theoretical insights underpin the model's design, offering a robust framework for its application.


Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends

arXiv.org Artificial Intelligence

In recent years, interest in vision-language tasks has grown, especially those involving chart interactions. These tasks are inherently multimodal, requiring models to process chart images, accompanying text, underlying data tables, and often user queries. Traditionally, Chart Understanding (CU) relied on heuristics and rule-based systems. However, recent advancements that have integrated transformer architectures significantly improved performance. This paper reviews prominent research in CU, focusing on State-of-The-Art (SoTA) frameworks that employ transformers within End-to-End (E2E) solutions. Relevant benchmarking datasets and evaluation techniques are analyzed. Additionally, this article identifies key challenges and outlines promising future directions for advancing CU solutions. Following the PRISMA guidelines, a comprehensive literature search is conducted across Google Scholar, focusing on publications from Jan'20 to Jun'24. After rigorous screening and quality assessment, 32 studies are selected for in-depth analysis. The CU tasks are categorized into a three-layered paradigm based on the cognitive task required. Recent advancements in the frameworks addressing various CU tasks are also reviewed. Frameworks are categorized into single-task or multi-task based on the number of tasks solvable by the E2E solution. Within multi-task frameworks, pre-trained and prompt-engineering-based techniques are explored. This review overviews leading architectures, datasets, and pre-training tasks. Despite significant progress, challenges remain in OCR dependency, handling low-resolution images, and enhancing visual reasoning. Future directions include addressing these challenges, developing robust benchmarks, and optimizing model efficiency. Additionally, integrating explainable AI techniques and exploring the balance between real and synthetic data are crucial for advancing CU research.


Reasoning with Natural Language Explanations

arXiv.org Artificial Intelligence

Explanation constitutes an archetypal feature of human rationality, underpinning learning and generalisation, and representing one of the media supporting scientific discovery and communication. Due to the importance of explanations in human reasoning, an increasing amount of research in Natural Language Inference (NLI) has started reconsidering the role that explanations play in learning and inference, attempting to build explanation-based NLI models that can effectively encode and use natural language explanations on downstream tasks. Research in explanation-based NLI, however, presents specific challenges and opportunities, as explanatory reasoning reflects aspects of both material and formal inference, making it a particularly rich setting to model and deliver complex reasoning. In this tutorial, we provide a comprehensive introduction to the field of explanation-based NLI, grounding this discussion on the epistemological-linguistic foundations of explanations, systematically describing the main architectural trends and evaluation methodologies that can be used to build systems capable of explanatory reasoning.


Learning Low-Dimensional Metrics

Neural Information Processing Systems

This paper investigates the theoretical foundations of metric learning, focused on three key questions that are not fully addressed in prior work: 1) we consider learning general low-dimensional (low-rank) metrics as well as sparse metrics; 2) we develop upper and lower (minimax) bounds on the generalization error; 3) we quantify the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric; 4) we also bound the accuracy of the learned metric relative to the underlying true generative metric. All the results involve novel mathematical approaches to the metric learning problem, and also shed new light on the special case of ordinal embedding (aka non-metric multidimensional scaling).


Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

Neural Information Processing Systems

With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.


Enhancing Autonomous Navigation by Imaging Hidden Objects using Single-Photon LiDAR

arXiv.org Artificial Intelligence

Robust autonomous navigation in environments with limited visibility remains a critical challenge in robotics. We present a novel approach that leverages Non-Line-of-Sight (NLOS) sensing using single-photon LiDAR to improve visibility and enhance autonomous navigation. Our method enables mobile robots to "see around corners" by utilizing multi-bounce light information, effectively expanding their perceptual range without additional infrastructure. We propose a three-module pipeline: (1) Sensing, which captures multi-bounce histograms using SPAD-based LiDAR; (2) Perception, which estimates occupancy maps of hidden regions from these histograms using a convolutional neural network; and (3) Control, which allows a robot to follow safe paths based on the estimated occupancy. We evaluate our approach through simulations and real-world experiments on a mobile robot navigating an L-shaped corridor with hidden obstacles. Our work represents the first experimental demonstration of NLOS imaging for autonomous navigation, paving the way for safer and more efficient robotic systems operating in complex environments. We also contribute a novel dynamics-integrated transient rendering framework for simulating NLOS scenarios, facilitating future research in this domain.