function approximator
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- (4 more...)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- North America > United States > North Dakota (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (4 more...)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
- Asia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Texas (0.04)
- Education (0.68)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
Deep reinforcement learning (RL) has achieved remarkable success in solving complex tasks through its integration with deep neural networks (DNNs) as function approximators. However, the reliance on DNNs has introduced a new challenge called primacy bias, whereby these function approximators tend to prioritize early experiences, leading to overfitting. To alleviate this bias, a reset method has been proposed, which involves periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer. However, the use of this method can result in performance collapses after executing the reset, raising concerns from the perspective of safe RL and regret minimization. In this paper, we propose a novel reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method and enhance sample efficiency. The effectiveness of the proposed method is validated through various experiments including those in the domain of safe RL. Numerical results demonstrate its potential for real-world applications requiring high sample efficiency and safety considerations.
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning. In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise. We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve?
BooVI: Provably Efficient Bootstrapped Value Iteration
Despite the tremendous success of reinforcement learning (RL) with function approximation, efficient exploration remains a significant challenge, both practically and theoretically. In particular, existing theoretically grounded RL algorithms based on upper confidence bounds (UCBs), such as optimistic least-squares value iteration (LSVI), are often incompatible with practically powerful function approximators, such as neural networks. In this paper, we develop a variant of \underline{boo}tstrapped LS\underline{VI}, namely BooVI, which bridges such a gap between practice and theory.
Deep SOR Minimax Q-learning for Two-player Zero-sum Game
Gautam, Saksham, Mandal, Lakshmi, Bhatnagar, Shalabh
In this work, we consider the problem of a two-player zero-sum game. In the literature, the successive over-relaxation Q-learning algorithm has been developed and implemented, and it is seen to result in a lower contraction factor for the associated Q-Bellman operator resulting in a faster value iteration-based procedure. However, this has been presented only for the tabular case and not for the setting with function approximation that typically caters to real-world high-dimensional state-action spaces. Furthermore, such settings in the case of two-player zero-sum games have not been considered. We thus propose a deep successive over-relaxation minimax Q-learning algorithm that incorporates deep neural networks as function approximators and is suitable for high-dimensional spaces. We prove the finite-time convergence of the proposed algorithm. Through numerical experiments, we show the effectiveness of the proposed method over the existing Q-learning algorithm. Our ablation studies demonstrate the effect of different values of the crucial successive over-relaxation parameter.