lem
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration
Yu, Huizhen, Wan, Yi, Sutton, Richard S.
This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision processes (SMDPs). We establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. In particular, we show that the algorithm converges almost surely to a compact, connected subset of solutions to the average-reward optimality equation, with convergence to a unique, sample path-dependent solution under additional stepsize and asynchrony conditions. Moreover, to make full use of the SA framework, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework and are addressed through novel arguments in the stability and convergence analysis of RVI Q-learning.
- North America > Canada > Alberta (0.14)
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- Asia > India > NCT > New Delhi (0.04)
Provable Benefits of Sinusoidal Activation for Modular Addition
This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths. In contrast, the width of ReLU networks must scale linearly with $m$ to interpolate, and they cannot simultaneously fit two lengths with different residues modulo $p$. We then provide a novel Natarajan-dimension generalization bound for sine networks, yielding nearly optimal sample complexity $\widetilde{\mathcal{O}}(p)$ for ERM over constant-width sine networks. We also derive width-independent, margin-based generalization for sine networks in the overparametrized regime and validate it. Empirically, sine networks generalize consistently better than ReLU networks across regimes and exhibit strong length extrapolation.
- North America > United States > Illinois > Cook County > Chicago (0.40)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Appendix: On Infinite-Width Hypernetworks
To further demonstrate the behavior reported in Figure 1 (main text), we verified that it is consistent regardless of the value of the learning rate. However, for earlier epochs, the performance improves for shallower and wider architectures.(a) As a consequence of Thm. 1, we prove that Sec. 3, terms of the form in Eq. 5 represent high order terms in the multivariate Taylor expansion of As a consequence of Thm. 1, we prove that In this section, we prove Lem. 3, which is the main technical lemma that enables us proving Thm. 1. To estimate the order of magnitude of the expression in Eq. 7, we provide an explicit expression for By Eqs. 14 and 10, we see that: T Lemma 2. The following holds: 1. F or n Lemma 3. Let k 0 and sets l = {l The case k = 0 is trivial. By Eq. 16, it holds that: n Lemma 4. Let h(u; w) = g (z; f ( x; w)) be a hypernetwork.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- North America > Canada (0.04)
- Europe > Ireland (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)
Appendix A Definition of Gâteaux and Fréchet derivatives
Whence we get the simpler assumption (A" In other cases, one needs more specific theorems such as Dunford-Pettis' theorem for L The other conditions are difficult to verify for given functionals. We conclude by using Lemma 13. In this space, the shortest distance paths between measures are given by their square-norm distance. While both frameworks yield optimisation algorithms on measure spaces, the geometries and algorithms are very different. F is decreasing at each iteration.
- North America > United States (0.14)
- North America > Canada (0.04)