nac
Neural Arithmetic Logic Units
Andrew Trask, Felix Hill, Scott E. Reed, Jack Rae, Chris Dyer, Phil Blunsom
Specifically,one frequently observes failures when quantities that lie outside the numerical range used during training are encountered at test time, even when the target functionissimple (e.g., itdepends only onaggregating counts orlinear extrapolation). This failure patternindicates that the learned behavior is better characterized by memorization than by systematic abstraction.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Neural Attentive Circuits
Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned.
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-2}/\log(1/\epsilon))$. Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of $\mathcal{O}((1-\gamma)^{-3})$ and $\mathcal{O}((1-\gamma)^{-4}\epsilon^{-2}/\log(1/\epsilon))$, respectively. This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > Quebec > Montreal (0.04)
2dace78f80bc92e6d7493423d729448e-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. It presents a slight modification of the NAC algorithm, where the original algorithm is a special case which is called forgetful NAC. The authors show that forget full Nac and optimistic policy iteration are equivalent. The authors also present a non-optimality result for soft-greedy Gibbs distribution, I.e., the optimal solution is not a fixed point of the policy iteration algorithm. I liked the unified view on both type of algorithms.
- Summary/Review (0.48)
- Research Report > New Finding (0.35)