spectral normalization
Towards Deeper Deep Reinforcement Learning with Spectral Normalization
In computer vision and natural language processing, innovations in model architecture that increase model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on actor-critic algorithms. We empirically verify that naรฏvely adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that instability from taking gradients through the critic is the culprit. We demonstrate that spectral normalization (SN) can mitigate this issue and enable stable training with large modern architectures. After smoothing with SN, larger models yield significant performance improvements -- suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.
Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture Appendix AProofs and Derivations
The KL term in Equation (8) has an analytical expression because both q(uv)and p(uv)are Gaussian distributions. However, the log likelihood term is not analytical yet. In this section, we provide detailed experimental settings and additional experimental results for the synthetic dataset experiment in Appendix B.1, the robustness to noise experiment in Appendix B.2, and OOD samples detection experiment in Appendix B.3. B.1 Synthetic Dataset Experiment Dataset The original moon dataset in Scikit-learn1 has two sets of 2D data points: upper unit circle points (class 1) and lower unit circle points (class 2). We modified the original code by changing the radius of circle with three radius values (view 1: 1.7, view 2: 1.0, and view 3: 0.3) with a fixed random state.
4ffb0d2ba92f664c2281970110a2e071-Paper.pdf
TheobjectiveofGANs istoproduce random samples from atarget data distribution, given only access toan initial set of training samples. This isachievedbylearning twofunctions: ageneratorG,which maps random input noise to a generated sample, and a discriminatorD, which tries to classify input samples as either real (i.e., from the training dataset) or fake (i.e., produced by the generator).
543e83748234f7cbab21aa0ade66565f-Paper.pdf
Efficient methods that reliably quantify a deep neural network (DNN)'s predictive uncertainty are important for industrial-scale, real-world applications, which include examples such as object recognition in autonomous driving [22], ad click prediction in online advertising [76], and intent understanding inaconversational system [84].