Goto

Collaborating Authors

 Reinforcement Learning


ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition

arXiv.org Artificial Intelligence

Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel processors. Prior works rely on expert-designed sparse tensor formats that fail to adapt to irregular tensor shapes and/or highly variable data distributions. We present the reinforcement-learned adaptive tensor encoding (ReLATE) framework, a novel learning-augmented method that automatically constructs efficient sparse tensor representations without labeled training samples. ReLATE employs an autonomous agent that discovers optimized tensor encodings through direct interaction with the TD environment, leveraging a hybrid model-free and model-based algorithm to learn from both real and imagined actions. Moreover, ReLATE introduces rule-driven action masking and dynamics-informed action filtering mechanisms that ensure functionally correct tensor encoding with bounded execution time, even during early learning stages. By automatically adapting to both irregular tensor shapes and data distributions, ReLATE generates sparse tensor representations that consistently outperform expert-designed formats across diverse sparse tensor data sets, achieving up to 2X speedup compared to the best sparse format, with a geometric-mean speedup of 1.4-1.46X.


The Temporal Game: A New Perspective on Temporal Relation Extraction

arXiv.org Artificial Intelligence

In this paper we demo the Temporal Game, a novel approach to temporal relation extraction that casts the task as an interactive game. Instead of directly annotating interval-level relations, our approach decomposes them into point-wise comparisons between the start and end points of temporal entities. At each step, players classify a single point relation, and the system applies temporal closure to infer additional relations and enforce consistency. This point-based strategy naturally supports both interval and instant entities, enabling more fine-grained and flexible annotation than any previous approach. The Temporal Game also lays the groundwork for training reinforcement learning agents, by treating temporal annotation as a sequential decision-making task. To showcase this potential, the demo presented in this paper includes a Game mode, in which users annotate texts from the TempEval-3 dataset and receive feedback based on a scoring system, and an Annotation mode, that allows custom documents to be annotated and resulting timeline to be exported. Therefore, this demo serves both as a research tool and an annotation interface. The demo is publicly available at https://temporal-game.inesctec.pt, and the source code is open-sourced to foster further research and community-driven development in temporal reasoning and annotation.


Financial Decision Making using Reinforcement Learning with Dirichlet Priors and Quantum-Inspired Genetic Optimization

arXiv.org Artificial Intelligence

Traditional budget allocation models struggle with the stochastic and nonlinear nature of real-world financial data. This study proposes a hybrid reinforcement learning (RL) framework for dynamic budget allocation, enhanced with Dirichlet-inspired stochasticity and quantum mutation-based genetic optimization. Using Apple Inc. quarterly financial data (2009 to 2025), the RL agent learns to allocate budgets between Research and Development and Selling, General and Administrative to maximize profitability while adhering to historical spending patterns, with L2 penalties discouraging unrealistic deviations. A Dirichlet distribution governs state evolution to simulate shifting financial contexts. To escape local minima and improve generalization, the trained policy is refined using genetic algorithms with quantum mutation via parameterized qubit rotation circuits. Generation-wise rewards and penalties are logged to visualize convergence and policy behavior. On unseen fiscal data, the model achieves high alignment with actual allocations (cosine similarity 0.9990, KL divergence 0.0023), demonstrating the promise of combining deep RL, stochastic modeling, and quantum-inspired heuristics for adaptive enterprise budgeting.


HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

arXiv.org Artificial Intelligence

Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation. Nevertheless, translating multi-source human hand motions into feasible robot behaviors remains challenging, particularly for robots equipped with multi-fingered dexterous hands characterized by complex, high-dimensional action spaces. Moreover, existing approaches often struggle to produce policies capable of adapting to diverse environmental conditions. In this paper, we introduce HERMES, a human-to-robot learning framework for mobile bimanual dexterous manipulation. First, HERMES formulates a unified reinforcement learning approach capable of seamlessly transforming heterogeneous human hand motions from multiple sources into physically plausible robotic behaviors. Subsequently, to mitigate the sim2real gap, we devise an end-to-end, depth image-based sim2real transfer method for improved generalization to real-world scenarios. Furthermore, to enable autonomous operation in varied and unstructured environments, we augment the navigation foundation model with a closed-loop Perspective-n-Point (PnP) localization mechanism, ensuring precise alignment of visual goals and effectively bridging autonomous navigation and dexterous manipulation. Extensive experimental results demonstrate that HERMES consistently exhibits generalizable behaviors across diverse, in-the-wild scenarios, successfully performing numerous complex mobile bimanual dexterous manipulation tasks. Project Page:https://gemcollector.github.io/HERMES/.


Reinforcement Learning-based Adaptive Path Selection for Programmable Networks

arXiv.org Artificial Intelligence

This work presents a proof-of-concept implementation of a distributed, in-network reinforcement learning (IN-RL) framework for adaptive path selection in programmable networks. By combining Stochastic Learning Automata (SLA) with real-time telemetry data collected via In-Band Network Telemetry (INT), the proposed system enables local, data-driven forwarding decisions that adapt dynamically to congestion conditions. The system is evaluated on a Mininet-based testbed using P4-programmable BMv2 switches, demonstrating how our SLA-based mechanism converges to effective path selections and adapts to shifting network conditions at line rate.


Efficient Manipulation-Enhanced Semantic Mapping With Uncertainty-Informed Action Selection

arXiv.org Artificial Intelligence

-- Service robots operating in cluttered human environments such as homes, offices, and schools cannot rely on predefined object arrangements and must continuously update their semantic and spatial estimates while dealing with possible frequent rearrangements. Efficient and accurate mapping under such conditions demands selecting informative viewpoints and targeted manipulations to reduce occlusions and uncertainty. In this work, we present a manipulation-enhanced semantic mapping framework for occlusion-heavy shelf scenes that integrates evidential metric-semantic mapping with reinforcement-learning-based next-best view planning and targeted action selection. Our method thereby exploits uncertainty estimates from Dirichlet and Beta distributions in the map prediction networks to guide both active sensor placement and object manipulation, focusing on areas with high uncertainty and selecting actions with high expected information gain. Furthermore, we introduce an uncertainty-informed push strategy that targets occlusion-critical objects and generates minimally invasive actions to reveal hidden regions by reducing overall uncertainty in the scene. The experimental evaluation shows that our framework enables to accurately map cluttered scenes, while substantially reducing object displacement and achieving a 95% reduction in planning time compared to the state-of-the-art, thereby realizing real-world applicability. The successful deployment of general-purpose service robots in homes and offices relies on perceiving and manipulating diverse objects in cluttered and constrained spaces. Robots must move beyond passive perception (e.g., mapping with a static camera) and instead apply active perception [1] in combination with deliberate physical object interactions [2].


Motion Priors Reimagined: Adapting Flat-Terrain Skills for Complex Quadruped Mobility

arXiv.org Artificial Intelligence

Reinforcement learning (RL)-based motion imitation methods trained on demonstration data can effectively learn natural and expressive motions with minimal reward engineering but often struggle to generalize to novel environments. We address this by proposing a hierarchical RL framework in which a low-level policy is first pre-trained to imitate animal motions on flat ground, thereby establishing motion priors. A subsequent high-level, goal-conditioned policy then builds on these priors, learning residual corrections that enable perceptive locomotion, local obstacle avoidance, and goal-directed navigation across diverse and rugged terrains. Simulation experiments illustrate the effectiveness of learned residuals in adapting to progressively challenging uneven terrains while still preserving the locomotion characteristics provided by the motion priors. Furthermore, our results demonstrate improvements in motion regularization over baseline models trained without motion priors under similar reward setups. Real-world experiments with an ANYmal-D quadruped robot confirm our policy's capability to generalize animal-like locomotion skills to complex terrains, demonstrating smooth and efficient locomotion and local navigation performance amidst challenging terrains with obstacles.


Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning

arXiv.org Artificial Intelligence

Matched filters are widely used to localise signal patterns due to their high efficiency and interpretability. However, their effectiveness deteriorates for low signal-to-noise ratio (SNR) signals, such as those recorded on edge devices, where prominent noise patterns can closely resemble the target within the limited length of the filter. One example is the ear-electrocardiogram (ear-ECG), where the cardiac signal is attenuated and heavily corrupted by artefacts. To address this, we propose the Sequential Matched Filter (SMF), a paradigm that replaces the conventional single matched filter with a sequence of filters designed by a Reinforcement Learning agent. By formulating filter design as a sequential decision-making process, SMF adaptively design signal-specific filter sequences that remain fully interpretable by revealing key patterns driving the decision-making. The proposed SMF framework has strong potential for reliable and interpretable clinical decision support, as demonstrated by its state-of-the-art R-peak detection and physiological state classification performance on two challenging real-world ECG datasets. The proposed formulation can also be extended to a broad range of applications that require accurate pattern localisation from noise-corrupted signals.


Multi-critic Learning for Whole-body End-effector Twist Tracking

arXiv.org Artificial Intelligence

Learning whole-body control for locomotion and arm motions in a single policy has challenges, as the two tasks have conflicting goals. For instance, efficient locomotion typically favors a horizontal base orientation, while end-effector tracking may benefit from base tilting to extend reachability. Additionally, current Reinforcement Learning (RL) approaches using a pose-based task specification lack the ability to directly control the end-effector velocity, making smoothly executing trajectories very challenging. To address these limitations, we propose an RL-based framework that allows for dynamic, velocity-aware whole-body end-effector control. Our method introduces a multi-critic actor architecture that decouples the reward signals for locomotion and manipulation, simplifying reward tuning and allowing the policy to resolve task conflicts more effectively. Furthermore, we design a twist-based end-effector task formulation that can track both discrete poses and motion trajectories. We validate our approach through a set of simulation and hardware experiments using a quadruped robot equipped with a robotic arm. The resulting controller can simultaneously walk and move its end-effector and shows emergent whole-body behaviors, where the base assists the arm in extending the workspace, despite a lack of explicit formulations. Videos and supplementary material can be found at multi-critic-locomanipulation.github.io.


Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

arXiv.org Artificial Intelligence

Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad