Goto

Collaborating Authors

 lem




Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration

Yu, Huizhen, Wan, Yi, Sutton, Richard S.

arXiv.org Artificial Intelligence

This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision processes (SMDPs). We establish the convergence of an asynchronous SA analogue of Schweitzer's classical relative value iteration algorithm, RVI Q-learning, for finite-space, weakly communicating SMDPs. In particular, we show that the algorithm converges almost surely to a compact, connected subset of solutions to the average-reward optimality equation, with convergence to a unique, sample path-dependent solution under additional stepsize and asynchrony conditions. Moreover, to make full use of the SA framework, we introduce new monotonicity conditions for estimating the optimal reward rate in RVI Q-learning. These conditions substantially expand the previously considered algorithmic framework and are addressed through novel arguments in the stability and convergence analysis of RVI Q-learning.


Provable Benefits of Sinusoidal Activation for Modular Addition

Huang, Tianlong, Li, Zhiyuan

arXiv.org Machine Learning

This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths. In contrast, the width of ReLU networks must scale linearly with $m$ to interpolate, and they cannot simultaneously fit two lengths with different residues modulo $p$. We then provide a novel Natarajan-dimension generalization bound for sine networks, yielding nearly optimal sample complexity $\widetilde{\mathcal{O}}(p)$ for ERM over constant-width sine networks. We also derive width-independent, margin-based generalization for sine networks in the overparametrized regime and validate it. Empirically, sine networks generalize consistently better than ReLU networks across regimes and exhibit strong length extrapolation.


Appendix: On Infinite-Width Hypernetworks

Neural Information Processing Systems

To further demonstrate the behavior reported in Figure 1 (main text), we verified that it is consistent regardless of the value of the learning rate. However, for earlier epochs, the performance improves for shallower and wider architectures.(a) As a consequence of Thm. 1, we prove that Sec. 3, terms of the form in Eq. 5 represent high order terms in the multivariate Taylor expansion of As a consequence of Thm. 1, we prove that In this section, we prove Lem. 3, which is the main technical lemma that enables us proving Thm. 1. To estimate the order of magnitude of the expression in Eq. 7, we provide an explicit expression for By Eqs. 14 and 10, we see that: T Lemma 2. The following holds: 1. F or n Lemma 3. Let k 0 and sets l = {l The case k = 0 is trivial. By Eq. 16, it holds that: n Lemma 4. Let h(u; w) = g (z; f ( x; w)) be a hypernetwork.



Provable Gradient Variance Guarantees for Black-Box Variational Inference

Neural Information Processing Systems

Recent variational inference methods use stochastic gradient estimators whose variance is not well understood. Theoretical guarantees for these estimators are important to understand when these methods will or will not work.



Appendix A Definition of Gâteaux and Fréchet derivatives

Neural Information Processing Systems

Whence we get the simpler assumption (A" In other cases, one needs more specific theorems such as Dunford-Pettis' theorem for L The other conditions are difficult to verify for given functionals. We conclude by using Lemma 13. In this space, the shortest distance paths between measures are given by their square-norm distance. While both frameworks yield optimisation algorithms on measure spaces, the geometries and algorithms are very different. F is decreasing at each iteration.