Goto

Collaborating Authors

 feedforward neural network


4547dff5fd7604f18c8ee32cf3da41d7-Supplemental.pdf

Neural Information Processing Systems

In training every agent we use a distributed framework for simulation and training. For simulation, we run 6400 Hanabi environments in parallel and the trajectories are batched together for efficient GPU computation. This is done efficiently as every thread can hold many environments in which many agents interact. Every agent chooses actions based on neural network calls, which are more intensive and done by GPUs. By doing these calls asynchronously it allows a thread to support multiple environments while waiting for prior agents' actions to be computed.


Appendix

Neural Information Processing Systems

We extra define the following notations for the proof. In Assumption 3.2, we assume the Lipschitz continuity and smoothness for all the activation functions. In the proof of lemmas, e.g., Lemma B.1 and B.2, we only use the fact that they are Lipschitz continuous and smooth, as well as bounded by a constant 0 > 0 at point 0, hence we use () to denote all the activation functions like what we do in Assumption 3.2 for simplicity. Additionally, in the following we introduce notations of the derivatives, mainly used in the proof of Lemma B.1 and Lemma B.2. By definition of feedforward neural networks in Section 2, different from the standard neural networks such as FCNs and CNNs in which the connection between neurons are generally only in adjacent layers, the neurons in feedforward neural networks can be arbitrarily connected as long as there is no loop.


Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Neural Information Processing Systems

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.



78ed45281dd746a265fff16ff75a02e5-Paper-Conference.pdf

Neural Information Processing Systems

Unfortunately, these theoretical results cannot well explain the empirical successes of deep learning well, as they require the model size tobenolargerthan O(n)(thegeneralization boundsbecomevacuousotherwise).






4547dff5fd7604f18c8ee32cf3da41d7-Supplemental.pdf

Neural Information Processing Systems

Wecomputethepriorityof eachtrajectoryas ξ = 0.9 maxiξi+0.1 ξ [21],whereξi istheTDerrorperstep.Fromthetraining perspective we have a training loop that continuously samples trajectories from the replay buffer and updates the model based on TD error. The simulation policies are updated to be the training policyevery10gradient steps. Concretely, each of the games played simultaneously has an agent from a set level. Therefore we refer to this policy asRankBot. Similarly, we may expect a color based equivalent of the Rank Bot but in practice we find it difficult to learn such policy naturally.