Markov Models
Learning Flexible Heterogeneous Coordination with Capability-Aware Shared Hypernetworks
Fu, Kevin, Howell, Pierce, Jain, Shalin, Ravichandar, Harish
Cooperative heterogeneous multi-agent tasks require agents to effectively coordinate their behaviors while accounting for their relative capabilities. Learning-based solutions to this challenge span between two extremes: i) shared-parameter methods, which encode diverse behaviors within a single architecture by assigning an ID to each agent, and are sample-efficient but result in limited behavioral diversity; ii) independent methods, which learn a separate policy for each agent, and show greater behavioral diversity but lack sample-efficiency. Prior work has also explored selective parameter-sharing, allowing for a compromise between diversity and efficiency. None of these approaches, however, effectively generalize to unseen agents or teams. We present Capability-Aware Shared Hypernetworks (CASH), a novel architecture for heterogeneous multi-agent coordination that generates sufficient diversity while maintaining sample-efficiency via soft parameter-sharing hypernetworks. Intuitively, CASH allows the team to learn common strategies using a shared encoder, which are then adapted according to the team's individual and collective capabilities with a hypernetwork, allowing for zero-shot generalization to unseen teams and agents. We present experiments across two heterogeneous coordination tasks and three standard learning paradigms (imitation learning, on- and off-policy reinforcement learning). CASH is able to outperform baseline architectures in success rate and sample efficiency when evaluated on unseen teams and agents despite using less than half of the learnable parameters.
A Multi-Layer CNN-GRUSKIP model based on transformer for spatial TEMPORAL traffic flow prediction
Ata, Karimeh Ibrahim Mohammad, Hassan, Mohd Khair, Ismaeel, Ayad Ghany, Al-Haddad, Syed Abdul Rahman, Alquthami, Thamer, Alani, Sameer
Traffic flow prediction remains a cornerstone for intelligent transportation systems ITS, influencing both route optimization and environmental efforts. While Recurrent Neural Networks RNN and traditional Convolutional Neural Networks CNN offer some insights into the spatial temporal dynamics of traffic data, they are often limited when navigating sparse and extended spatial temporal patterns. In response, the CNN-GRUSKIP model emerges as a pioneering approach. Notably, it integrates the GRU-SKIP mechanism, a hybrid model that leverages the Gate Recurrent Unit of GRU capabilities to process sequences with the SKIP feature of ability to bypass and connect longer temporal dependencies, making it especially potent for traffic flow predictions with erratic and extended patterns. Another distinctive aspect is its non-standard 6-layer CNN, meticulously designed for in-depth spatiotemporal correlation extraction. The model comprises (1) the specialized CNN feature extraction, (2) the GRU-SKIP enhanced long-temporal module adept at capturing extended patterns, (3) a transformer module employing encoder-decoder and multi-attention mechanisms to hone prediction accuracy and trim model complexity, and (4) a bespoke prediction module. When tested against real-world datasets from California of Caltrans Performance Measurement System PeMS, specifically PeMS districts 4 and 8, the CNN-GRUSKIP consistently outperformed established models such as ARIMA, Graph Wave Net, HA, LSTM, STGCN, and APTN. With its potent predictive prowess and adaptive architecture, the CNN-GRUSKIP model stands to redefine ITS applications, especially where nuanced traffic dynamics are in play.
Generative Flow Networks: Theory and Applications to Structure Learning
Without any assumptions about data generation, multiple causal models may explain our observations equally well. To avoid selecting a single arbitrary model that could result in unsafe decisions if it does not match reality, it is therefore essential to maintain a notion of epistemic uncertainty about our possible candidates. This thesis studies the problem of structure learning from a Bayesian perspective, approximating the posterior distribution over the structure of a causal model, represented as a directed acyclic graph (DAG), given data. It introduces Generative Flow Networks (GFlowNets), a novel class of probabilistic models designed for modeling distributions over discrete and compositional objects such as graphs. They treat generation as a sequential decision making problem, constructing samples of a target distribution defined up to a normalization constant piece by piece. In the first part of this thesis, we present the mathematical foundations of GFlowNets, their connections to existing domains of machine learning and statistics such as variational inference and reinforcement learning, and their extensions beyond discrete problems. In the second part of this thesis, we show how GFlowNets can approximate the posterior distribution over DAG structures of causal Bayesian Networks, along with the parameters of its causal mechanisms, given observational and experimental data.
CoDe: Communication Delay-Tolerant Multi-Agent Collaboration via Dual Alignment of Intent and Timeliness
Song, Shoucheng, Lin, Youfang, Han, Sheng, Yao, Chang, Wu, Hao, Wang, Shuo, Lv, Kai
Communication has been widely employed to enhance multi-agent collaboration. Previous research has typically assumed delay-free communication, a strong assumption that is challenging to meet in practice. However, real-world agents suffer from channel delays, receiving messages sent at different time points, termed {\it{Asynchronous Communication}}, leading to cognitive biases and breakdowns in collaboration. This paper first defines two communication delay settings in MARL and emphasizes their harm to collaboration. To handle the above delays, this paper proposes a novel framework, Communication Delay-tolerant Multi-Agent Collaboration (CoDe). At first, CoDe learns an intent representation as messages through future action inference, reflecting the stable future behavioral trends of the agents. Then, CoDe devises a dual alignment mechanism of intent and timeliness to strengthen the fusion process of asynchronous messages. In this way, agents can extract the long-term intent of others, even from delayed messages, and selectively utilize the most recent messages that are relevant to their intent. Experimental results demonstrate that CoDe outperforms baseline algorithms in three MARL benchmarks without delay and exhibits robustness under fixed and time-varying delays.
Constrained Optimization of Charged Particle Tracking with Multi-Agent Reinforcement Learning
Kortus, Tobias, Keidel, Ralf, Gauger, Nicolas R., Kieseler, Jan
Reinforcement learning demonstrated immense success in modelling complex physics-driven systems, providing end-to-end trainable solutions by interacting with a simulated or real environment, maximizing a scalar reward signal. In this work, we propose, building upon previous work, a multi-agent reinforcement learning approach with assignment constraints for reconstructing particle tracks in pixelated particle detectors. Our approach optimizes collaboratively a parametrized policy, functioning as a heuristic to a multidimensional assignment problem, by jointly minimizing the total amount of particle scattering over the reconstructed tracks in a readout frame. To satisfy constraints, guaranteeing a unique assignment of particle hits, we propose a safety layer solving a linear assignment problem for every joint action. Further, to enforce cost margins, increasing the distance of the local policies predictions to the decision boundaries of the optimizer mappings, we recommend the use of an additional component in the blackbox gradient estimation, forcing the policy to solutions with lower total assignment costs. We empirically show on simulated data, generated for a particle detector developed for proton imaging, the effectiveness of our approach, compared to multiple single- and multi-agent baselines. We further demonstrate the effectiveness of constraints with cost margins for both optimization and generalization, introduced by wider regions with high reconstruction performance as well as reduced predictive instabilities. Our results form the basis for further developments in RL-based tracking, offering both enhanced performance with constrained policies and greater flexibility in optimizing tracking algorithms through the option for individual and team rewards.
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
Zhao, Yujie, Escamill, Jose Efraim Aguilar, Lu, Weyl, Wang, Huazheng
Reinforcement Learning from Human Feedback (RLHF) has recently surged in popularity, particularly for aligning large language models and other AI systems with human intentions. At its core, RLHF can be viewed as a specialized instance of Preference-based Reinforcement Learning (PbRL), where the preferences specifically originate from human judgments rather than arbitrary evaluators. Despite this connection, most existing approaches in both RLHF and PbRL primarily focus on optimizing a mean reward objective, neglecting scenarios that necessitate risk-awareness, such as AI safety, healthcare, and autonomous driving. These scenarios often operate under a one-episode-reward setting, which makes conventional risk-sensitive objectives inapplicable. To address this, we explore and prove the applicability of two risk-aware objectives to PbRL: nested and static quantile risk objectives. We also introduce Risk-Aware-PbRL (RA-PbRL), an algorithm designed to optimize both nested and static objectives. Additionally, we provide a theoretical analysis of the regret upper bounds, demonstrating that they are sublinear with respect to the number of episodes, and present empirical results to support our findings.
AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design
Junior, Francisco Erivaldo Fernandes, Oulasvirta, Antti
Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex ways, optimizing them is a black-box problem that proves especially challenging for nonexperts. Although existing optimization-as-a-service platforms (e.g., Vizier and Optuna) can handle such problems, they are impractical for RL systems, since the need for manual user mapping of each parameter to distinct components makes the effort cumbersome. It also requires understanding of the optimization process, limiting the systems' application beyond the machine learning field and restricting access in areas such as cognitive science, which models human decision-making. To tackle these challenges, the paper presents AgentForge, a flexible low-code platform to optimize any parameter set across an RL system. Available at https://github.com/feferna/AgentForge, it allows an optimization problem to be defined in a few lines of code and handed to any of the interfaced optimizers. With AgentForge, the user can optimize the parameters either individually or jointly. The paper presents an evaluation of its performance for a challenging vision-based RL problem.
Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning
Yu, Zhiyong, Jiang, Yuning, Liu, Xin, Shi, Yuanming, Jiang, Chunxiao, Kuang, Linling
With the growing demand for Earth observation, it is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements. The Space Computing Power Network (Space-CPN) offers a promising solution by providing onboard computing and extensive coverage capabilities for real-time inference. This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations to achieve real-time inference performance. The framework employs the microservice architecture, decomposing monolithic inference tasks into reusable, independent modules to address high latency and resource heterogeneity. This distributed approach enables optimized microservice deployment, minimizing resource utilization while meeting quality of service and functional requirements. We introduce Robust Optimization to the deployment problem to address data uncertainty. Additionally, we model the Robust Optimization problem as a Partially Observable Markov Decision Process and propose a robust reinforcement learning algorithm to handle the semi-infinite Quality of Service constraints. Our approach yields sub-optimal solutions that minimize accuracy loss while maintaining acceptable computational costs. Simulation results demonstrate the effectiveness of our framework.
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
Shakhadri, Syed Abdul Gaffar, KR, Kruthika, Angadi, Kartik Basavaraj
The rapid evolution of deep learning has significantly transformed Automatic Speech Recognition (ASR), shifting from traditional systems such as Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to advanced end-to-end neural architectures. While innovations such as Connectionist Temporal Classification (CTC) and attentionbased encoder-decoder models have established new baselines [1], transformer-based models like OpenAI's Whisper have further pushed the boundaries, setting state-of-the-art benchmarks for multilingual, multitask ASR systems [2]. Despite their successes, transformer architectures face inherent challenges in scaling to long sequences, particularly those encountered in extended audio recordings.
Learning Stochastic Nonlinear Dynamics with Embedded Latent Transfer Operators
Ke, Naichang, Tanaka, Ryogo, Kawahara, Yoshinobu
We consider an operator-based latent Markov representation of a stochastic nonlinear dynamical system, where the stochastic evolution of the latent state embedded in a reproducing kernel Hilbert space is described with the corresponding transfer operator, and develop a spectral method to learn this representation based on the theory of stochastic realization. The embedding may be learned simultaneously using reproducing kernels, for example, constructed with feed-forward neural networks. We also address the generalization of sequential state-estimation (Kalman filtering) in stochastic nonlinear systems, and of operator-based eigen-mode decomposition of dynamics, for the representation. Several examples with synthetic and real-world data are shown to illustrate the empirical characteristics of our methods, and to investigate the performance of our model in sequential state-estimation and mode decomposition.