nonlinear function approximation
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Overview (0.67)
- Research Report (0.46)
- Energy > Power Industry (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Communications > Networks (0.93)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Zap Q-Learning With Nonlinear Function Approximation
Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Overview (0.67)
- Research Report (0.46)
- Energy > Power Industry (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Communications > Networks (0.93)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Deep Reinforcement Learning with Gradient Eligibility Traces
Elelimy, Esraa, Daley, Brett, Patterson, Andrew, Machado, Marlos C., White, Adam, White, Martha
Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently susceptible to divergence. While more principled approaches like Gradient TD (GTD) methods have strong convergence guarantees, they have rarely been used in deep RL. Recent work introduced the Generalized Projected Bellman Error ($\GPBE$), enabling GTD methods to work efficiently with nonlinear function approximation. However, this work is only limited to one-step methods, which are slow at credit assignment and require a large number of samples. In this paper, we extend the $\GPBE$ objective to support multistep credit assignment based on the $λ$-return and derive three gradient-based methods that optimize this new objective. We provide both a forward-view formulation compatible with experience replay and a backward-view formulation compatible with streaming algorithms. Finally, we evaluate the proposed algorithms and show that they outperform both PPO and StreamQ in MuJoCo and MinAtar environments, respectively. Code available at https://github.com/esraaelelimy/gtd\_algos
- North America > Canada > Alberta (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Review for NeurIPS paper: Zap Q-Learning With Nonlinear Function Approximation
Summary and Contributions: This paper introduces a version of Zap Q-learning that can be applied to arbitrary approximation architectures for Q-functions. Convergence analysis is undertaken, and a version of the algorithm with MLP function approximators is applied to several classical control tasks. POST-REBUTTAL ------------------------ I thank the authors for their response. I appreciate the comments around reorganisation of material, and clarification of some of the technical points I raised. There are two main concerns that I have with the paper that prevent me from strongly recommending acceptance, described below.
Review for NeurIPS paper: Zap Q-Learning With Nonlinear Function Approximation
The reviewers are generally supportive of the paper. They have provided some very useful feedback, and I highly encourage the authors to incorporate that feedback. Primarily, it would be ideal to complete the paper reorganization as discussed, explain the limitations in the assumption on boundedness of the iterates, provide a toy example where the boundness assumption is not on its own enough to prevent divergence of Q-learning (i.e, even under that assumption, Q-learning diverges but Zap-Q does not) and finally to sweep over the parameters in the empirical comparison (even if that means the outcome is less positive for Zap-Q).
Zap Q-Learning With Nonlinear Function Approximation
Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)
Reviews: Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting
It is well known that the standard TD algorithm widely used in reinforcement learning does not correspond to the gradient of any objective function, and consequently is unstable when combined with any type of function approximation. Despite the success of methods like deep RL, which combines vanilla TD with deep learning, theoretically TD with nonlinear function approximation is demonstrably unstable. Much work on fixing this fundamental flaw in RL has been in vain, till the work on gradient TD methods by Sutton et al. Unfortunately, these methods work, but their analysis was flawed, based on a heuristic derivation of the method. A recent breakthrough by Liu et al. (UAI 2015) showed that gradient TD methods are essentially saddle point methods that are pure gradient methods that optimize not the original gradient TD loss function (which they do not), but rather the saddle point loss function that arises when converting the original loss function into the dual space.
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD( \lambda), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al (2009a,b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman-error, and algorithms that perform stochastic gradient-descent on this function. In this paper, we generalize their work to nonlinear function approximation.
IT2CFNN: An Interval Type-2 Correlation-Aware Fuzzy Neural Network to Construct Non-Separable Fuzzy Rules with Uncertain and Adaptive Shapes for Nonlinear Function Approximation
In this paper, a new interval type-2 fuzzy neural network able to construct non-separable fuzzy rules with adaptive shapes is introduced. To reflect the uncertainty, the shape of fuzzy sets considered to be uncertain. Therefore, a new form of interval type-2 fuzzy sets based on a general Gaussian model able to construct different shapes (including triangular, bell-shaped, trapezoidal) is proposed. To consider the interactions among input variables, input vectors are transformed to new feature spaces with uncorrelated variables proper for defining each fuzzy rule. Next, the new features are fed to a fuzzification layer using proposed interval type-2 fuzzy sets with adaptive shape. Consequently, interval type-2 non-separable fuzzy rules with proper shapes, considering the local interactions of variables and the uncertainty are formed. For type reduction the contribution of the upper and lower firing strengths of each fuzzy rule are adaptively selected separately. To train different parameters of the network, the Levenberg-Marquadt optimization method is utilized. The performance of the proposed method is investigated on clean and noisy datasets to show the ability to consider the uncertainty. Moreover, the proposed paradigm, is successfully applied to real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed model outperforms other methods with a more parsimonious structure.