return distribution
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Austria (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach
Mohammadi, Mehrdad, Zheng, Qi, Zhu, Ruoqing
We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations and empirical results demonstrate robust off-policy evaluation and recovery of the kernel mean embedding under mild assumptions, namely, Lipschitz continuity and boundedness of the kernels, highlighting the potential of embedding-based approaches in complex real-world decision-making scenarios and risk evaluation.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- North America > United States (0.15)
- Asia > China > Hong Kong (0.04)
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases.
RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
Fukazawa, Kai, Mundada, Kunal, Soltani, Iman
In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value or model-based pessimism, and restricted policy classes that limit policy expressiveness, whereas diffusion/flow-based expressive generative policies trained with a behavioral-cloning (BC) objective have been used only in risk-neutral settings. Here, we address this gap by introducing the \textbf{Risk-Aware Multimodal Actor-Critic (RAMAC)}, which couples an expressive generative actor with a distributional critic and, to our knowledge, is the first model-free approach that learns \emph{risk-aware expressive generative policies}. RAMAC differentiates a composite objective that adds a Conditional Value-at-Risk (CVaR) term to a BC loss, achieving risk-sensitive learning in complex multimodal scenarios. Since out-of-distribution (OOD) actions are a major driver of catastrophic failures in offline RL, we further analyze OOD behavior under prior-anchored perturbation schemes from recent BC-regularized risk-averse offline RL. This clarifies why a behavior-regularized objective that directly constrains the expressive generative policy to the dataset support provides an effective, risk-agnostic mechanism for suppressing OOD actions in modern expressive policies. We instantiate RAMAC with a diffusion-based actor, using it both to illustrate the analysis in a 2-D risky bandit and to deploy OOD-action detectors on Stochastic-D4RL benchmarks, empirically validating our insights. Across these tasks, we observe consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns. Our implementation is available at GitHub: https://github.com/KaiFukazawa/RAMAC.git
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)