Goto

Collaborating Authors

 Reinforcement Learning


Can a mobile robot learn from a pedestrian model to prevent the sidewalk salsa?

arXiv.org Artificial Intelligence

Pedestrians approaching each other on a sidewalk sometimes end up in an awkward interaction known as the "sidewalk salsa": they both (repeatedly) deviate to the same side to avoid a collision. This provides an interesting use case to study interactions between pedestrians and mobile robots because, in the vast majority of cases, this phenomenon is avoided through a negotiation based on implicit communication. Understanding how it goes wrong and how pedestrians end up in the sidewalk salsa will therefore provide insight into the implicit communication. This understanding can be used to design safe and acceptable robotic behaviour. In a previous attempt to gain this understanding, a model of pedestrian behaviour based on the Communication-Enabled Interaction (CEI) framework was developed that can replicate the sidewalk salsa. However, it is unclear how to leverage this model in robotic planning and decision-making since it violates the assumptions of game theory, a much-used framework in planning and decision-making. Here, we present a proof-of-concept for an approach where a Reinforcement Learning (RL) agent leverages the model to learn how to interact with pedestrians. The results show that a basic RL agent successfully learned to interact with the CEI model. Furthermore, a risk-averse RL agent that had access to the perceived risk of the CEI model learned how to effectively communicate its intention through its motion and thereby substantially lowered the perceived risk, and displayed effort by the modelled pedestrian. These results show this is a promising approach and encourage further exploration.


Spiking Decision Transformers: Local Plasticity, Phase-Coding, and Dendritic Routing for Low-Power Sequence Control

arXiv.org Artificial Intelligence

Reinforcement learning agents based on Transformer architectures have achieved impressive performance on sequential decision-making tasks, but their reliance on dense matrix operations makes them ill-suited for energy-constrained, edge-oriented platforms. Spiking neural networks promise ultra-low-power, event-driven inference, yet no prior work has seamlessly merged spiking dynamics with return-conditioned sequence modeling. We present the Spiking Decision Transformer (SNN-DT), which embeds Leaky Integrate-and-Fire neurons into each self-attention block, trains end-to-end via surrogate gradients, and incorporates biologically inspired three-factor plasticity, phase-shifted spike-based positional encodings, and a lightweight dendritic routing module. Our implementation matches or exceeds standard Decision Transformer performance on classic control benchmarks (CartPole-v1, MountainCar-v0, Acrobot-v1, Pendulum-v1) while emitting fewer than ten spikes per decision, an energy proxy suggesting over four orders-of-magnitude reduction in per inference energy. By marrying sequence modeling with neuromorphic efficiency, SNN-DT opens a pathway toward real-time, low-power control on embedded and wearable devices.


Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning

arXiv.org Artificial Intelligence

Uncertainty quantification in reinforcement learning can greatly improve exploration and robustness. Approximate Bayesian approaches have recently been popularized to quantify uncertainty in model-free algorithms. However, so far the focus has been on improving the accuracy of the posterior approximation, instead of studying the accuracy of the prior and likelihood assumptions underlying the posterior. In this work, we demonstrate that there is a cold posterior effect in Bayesian deep Q-learning, where contrary to theory, performance increases when reducing the temperature of the posterior. To identify and overcome likely causes, we challenge common assumptions made on the likelihood and priors in Bayesian model-free algorithms. We empirically study prior distributions and show through statistical tests that the common Gaussian likelihood assumption is frequently violated. We argue that developing more suitable likelihoods and priors should be a key focus in future Bayesian reinforcement learning research and we offer simple, implementable solutions for better priors in deep Q-learning that lead to more performant Bayesian algorithms.


Beyond expected value: geometric mean optimization for long-term policy performance in reinforcement learning

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms typically optimize the expected cumulative reward, i.e., the expected value of the sum of scalar rewards an agent receives over the course of a trajectory. The expected value averages the performance over an infinite number of trajectories. However, when deploying the agent in the real world, this ensemble average may be uninformative for the performance of individual trajectories. Thus, in many applications, optimizing the long-term performance of individual trajectories might be more desirable. In this work, we propose a novel RL algorithm that combines the standard ensemble average with the time-average growth rate, a measure for the long-term performance of individual trajectories. We first define the Bellman operator for the time-average growth rate. We then show that, under multiplicative reward dynamics, the geometric mean aligns with the time-average growth rate. To address more general and unknown reward dynamics, we propose a modified geometric mean with $N$-sliding window that captures the path-dependency as an estimator for the time-average growth rate. This estimator is embedded as a regularizer into the objective, forming a practical algorithm and enabling the policy to benefit from ensemble average and time-average simultaneously. We evaluate our algorithm in challenging simulations, where it outperforms conventional RL methods.


Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.


Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs

arXiv.org Artificial Intelligence

Recommender systems struggle to provide accurate suggestions to new users with limited interaction history, a challenge known as the cold-user problem. This paper proposes a reinforcement learning approach using Double and Dueling Deep Q-Networks (DQN) to dynamically learn user preferences from sparse feedback, enhancing recommendation accuracy without relying on sensitive demographic data. By integrating these advanced DQN variants with a matrix factorization model, we achieve superior performance on a large e-commerce dataset compared to traditional methods like popularity-based and active learning strategies. Experimental results show that our method, particularly Dueling DQN, reduces Root Mean Square Error (RMSE) for cold users, offering an effective solution for privacy-constrained environments.


Reinforcement Learning for Optimizing Large Qubit Array based Quantum Sensor Circuits

arXiv.org Artificial Intelligence

As the number of qubits in a sensor increases, the complexity of designing and controlling the quantum circuits grows exponentially. Manually optimizing these circuits becomes infeasible. Optimizing entanglement distribution in large-scale quantum circuits is critical for enhancing the sensitivity and efficiency of quantum sensors [5], [6]. This paper presents an engineering integration of reinforcement learning with tensor-network-based simulation (MPS) for scalable circuit optimization for optimizing quantum sensor circuits with up to 60 qubits. To enable efficient simulation and scalability, we adopt tensor network methods, specifically the Matrix Product State (MPS) representation, instead of traditional state vector or density matrix approaches. Our reinforcement learning agent learns to restructure circuits to maximize Quantum Fisher Information (QFI) and entanglement entropy while reducing gate counts and circuit depth. Experimental results show consistent improvements, with QFI values approaching 1, entanglement entropy in the 0.8-1.0 range, and up to 90% reduction in depth and gate count. These results highlight the potential of combining quantum machine learning and tensor networks to optimize complex quantum circuits under realistic constraints.


Quantum Machine Learning for Optimizing Entanglement Distribution in Quantum Sensor Circuits

arXiv.org Artificial Intelligence

In the rapidly evolving field of quantum computing, optimizing quantum circuits for specific tasks is crucial for enhancing performance and efficiency. More recently, quantum sensing has become a distinct and rapidly growing branch of research within the area of quantum science and technology. The field is expected to provide new opportunities, especially regarding high sensitivity and precision. Entanglement is one of the key factors in achieving high sensitivity and measurement precision [3]. This paper presents a novel approach utilizing quantum machine learning techniques to optimize entanglement distribution in quantum sensor circuits. By leveraging reinforcement learning within a quantum environment, we aim to optimize the entanglement layout to maximize Quantum Fisher Information (QFI) and entanglement entropy, which are key indicators of a quantum system's sensitivity and coherence, while minimizing circuit depth and gate counts. Our implementation, based on Qiskit, integrates noise models and error mitigation strategies to simulate realistic quantum environments. The results demonstrate significant improvements in circuit performance and sensitivity, highlighting the potential of machine learning in quantum circuit optimization by measuring high QFI and entropy in the range of 0.84-1.0 with depth and gate count reduction by 20-86%.


HCQA: Hybrid Classical-Quantum Agent for Generating Optimal Quantum Sensor Circuits

arXiv.org Artificial Intelligence

Abstract--This study proposes an HCQA for designing optimal Quantum Sensor Circuits (QSCs) to address complex quantum physics problems. The HCQA integrates computational intelligence techniques by leveraging a Deep Q-Network (DQN) for learning and policy optimization, enhanced by a quantum-based action selection mechanism based on the Q-values. Measurement of the circuit results in probabilistic action outcomes, allowing the agent to generate optimal QSCs by selecting sequences of gates that maximize the Quantum Fisher Information (QFI) while minimizing the number of gates. This computational intelligence-driven HCQA enables the automated generation of entangled quantum states, specifically the squeezed states, with high QFI sensitivity for quantum state estimation and control. This work highlights the synergy between AI-driven learning and quantum computation, illustrating how intelligent agents can autonomously discover optimal quantum circuit designs for enhanced sensing and estimation tasks. Impact Statement--The HCQA introduces a hybrid AIquantum framework for generating optimal QSCs, contributing to foundational advances in quantum metrology and intelligent quantum control. By integrating a DQN with quantum-based action selection, the HCQA learns to construct quantum circuits that achieve high QFI with reduced gate complexity. This approach demonstrates how reinforcement learning can guide quantum circuit synthesis in a goal-directed, data-efficient manner. While this work is demonstrated on a simplified two-qubit, noise-free simulation, it provides a proof of concept for how intelligent agents can autonomously learn and optimize QSCs. Technologically, this contributes to the growing field of Quantum Reinforcement Learning (QRL) and supports future exploration of scalable, noise-resilient extensions.


Automating the Deep Space Network Data Systems; A Case Study in Adaptive Anomaly Detection through Agentic AI

arXiv.org Artificial Intelligence

The Deep Space Network (DSN) is NASA's largest network of antenna facilities that generate a large volume of multivariate time - series data. These facilities, situated at Canberra - Australia, Madrid - Spain, and Goldstone - California, contain DSN antennas and transmitters that undergo degradation over long periods of time, which may cause costly disruptions to the data flow and threaten the earth - connection of dozens of spacecraft that rely on the Deep Space Network for their lifeline. The purpose of this study was to experiment with different methods and tools that would be able to assist JPL engineers with directly pinpointing anomalies and DSN equipment degradation through collected data, and continue conducting maintenance and operations of the DSN for future space missions around our universe. As such, we have researched various machine learning techniques and architectures that can fully reconstruct data through predictive analysis, and determine anomalous data entries within real - time datasets through statistical computations and thresholds . On top of the fully trained and tested machine learning models, we have also integrated the use of a reinforcement learning subsystem that classifies identified anomalies based on severity level and a Large Language Model that labels an explanation for each anomalous data entry, all of which can be improved and fine - tuned over time through human feedback /input. Specifically, for the DSN transmitters, we have also implemented a full data pipeline system that connects the data extraction, parsing, and pro ces sing workflow all together as there was no coherent program or script for performing these tasks before. Using this data pipeline system, we were able to then also connect the models trained from DSN an tenna data, completing the data workflow for DSN anomaly detection . This was all wrapped around and further connected by an agentic AI system, where complex reasoning was utilized to determine the classifications and predictions of anomalous data .