modern network
Towards Deeper Deep Reinforcement Learning with Spectral Normalization
In computer vision and natural language processing, innovations in model architecture that increase model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on actor-critic algorithms. We empirically verify that naïvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield significant performance improvements -- suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.
We thank the reviewers for their time and constructive feedback on the submission, which we will incorporate to 1 improve our manuscript
We find that they are positive-definite as expected. Supervised Differentiable Programming" by Chizat and Bach is an important contribution and we will absolutely Sec 2.2 in V1, V2) are restricted to single-hidden-layer networks. It is still an open research question to determine what are the main factors that determine these performance gaps. We will expand discussion around this.
Reviews: Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
The paper was proofread, well-structured, and very clear. The experiments were clearly described in detail, and provided relevant results. Below we outline some detailed comments of the results. In particular, Chizat and Bach prove that the training of an NTK parameterized network is closely modeled by "lazy training" (their terminology for a linearized model). This paper is not referenced in the related work section.
A path-norm toolkit for modern networks: consequences, promises and challenges
Gonon, Antoine, Brisebarre, Nicolas, Riccietti, Elisa, Gribonval, Rémi
This work introduces the first toolkit around path-norms that is fully able to encompass general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on feedforward networks compared to the product of operators' norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet.
Why do Modern Networks Require AIOps?
Over the past decade, network operations teams have had to deal with a number of issues in their networks--from increased complexity to more distributed environments. With AIOps, you can start optimizing your networks now and prepare for the future. AIOps lets you manage your network like never before. According to Gartner, AIOps combines big data and machine learning to automate IT operations processes such as event correlation, anomaly detection, and causality determination to name a few. It can be defined as the application of machine learning (ML) and data science to IT operations problems.