Well File:

On the Adversarial Robustness of Mixture of Experts

Neural Information Processing Systems

Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, Bubeck and Sellke [3] proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do--and can--functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts. The robustness of MoEs can suffer when the highest weighted experts for an input implement sufficiently different functions. We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost. We make key observations showing the robustness of MoEs to the choice of experts, highlighting the redundancy of experts in models trained in practice.


Yiwen Huang Aaron Gokaslan Volodymyr Kuleshov James Tompkin Brown University Cornell University Cornell University Brown University

Neural Information Processing Systems

There is a widely-spread claim that GANs are difficult to train, and GAN architectures in the literature are littered with empirical tricks. We provide evidence against this claim and build a modern GAN baseline in a more principled manner. First, we derive a well-behaved regularized relativistic GAN loss that addresses issues of mode dropping and non-convergence that were previously tackled via a bag of ad-hoc tricks. We analyze our loss mathematically and prove that it admits local convergence guarantees, unlike most existing relativistic losses. Second, this loss allows us to discard all ad-hoc tricks and replace outdated backbones used in common GANs with modern architectures. Using StyleGAN2 as an example, we present a roadmap of simplification and modernization that results in a new minimalist baseline--R3GAN ("Re-GAN"). Despite being simple, our approach surpasses StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST datasets, and compares favorably against state-of-the-art GANs and diffusion models.


Appendix A Supplementary figures

Neural Information Processing Systems

The production rules are shown in Figure A.2. Compared with MHG, our proposed grammars have better generalization ability. In comparison, our grammar is based on neighboring relationships. From the 220,011 training molecules, we obtained 1,775 production rules. Each molecule is associated with 28 production rules on the average. The maximum number of production rules associated with a molecule is 51.


5f268dfb0fbef44de0f668a022707b86-AuthorFeedback.pdf

Neural Information Processing Systems

The following is our response to all major comments. Baselines and benchmarks (Reviewers #2 and #3): We appreciate the reviewers' input on additional baselines and We will include a fair comparison with the new baseline in the final version. These insights as well as a suitable ablation study will be added in the final version. We will fix these two figures in the final version. Thus, we believe our results are reliable.


SCRREAM: SCan, Register, REnder And Map: A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark

Neural Information Processing Systems

Traditionally, 3d indoor datasets have generally prioritized scale over ground-truth accuracy in order to obtain improved generalization. However, using these datasets to evaluate dense geometry tasks, such as depth rendering, can be problematic as the meshes of the dataset are often incomplete and may produce wrong ground truth to evaluate the details. In this paper, we propose SCRREAM, a dataset annotation framework that allows annotation of fully dense meshes of objects in the scene and registers camera poses on the real image sequence, which can produce accurate ground truth for both sparse 3D as well as dense 3D tasks. We show the details of the dataset annotation pipeline and showcase four possible variants of datasets that can be obtained from our framework with example scenes, such as indoor reconstruction and SLAM, scene editing & object removal, human reconstruction and 6d pose estimation. Recent pipelines for indoor reconstruction and SLAM serve as new benchmarks. In contrast to previous indoor dataset, our design allows to evaluate dense geometry tasks on eleven sample scenes against accurately rendered ground truth depth maps.


Appendix A Double-LSTM

Neural Information Processing Systems

Inspired by the recent work [Dieng et al., 2019] that elucidates how skip-connections promote higher latent information content, we introduce a simple-to-implement modification of a standard LSTM [Hochreiter and Schmidhuber]. Double-LSTM aims to promote the utilization of the latent variable z, whilst increasing the expressive power of the autoregressive decoder. Double-LSTM consists of two LSTM units [Hochreiter and Schmidhuber] as depicted in Figure 5. The first LSTM unit is updated based on the latent variable z and the previous hidden state h. The second LSTM unit is updated based on z, h and the input word embedding w, which is subject to Figure 5: Double-LSTM.



Finding the Homology of Decision Boundaries with Active Learning

Neural Information Processing Systems

Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision boundaries. Our algorithm sequentially and adaptively selects which samples it requires the labels of. We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection. Source code for our algorithms and experimental results is available at https://github.


RLCG___NIPS

Neural Information Processing Systems

The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. You are strongly encouraged to include a justification to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. For example: Did you include the license to the code and datasets? Did you include the license to the code and datasets?


RLCG___NIPS

Neural Information Processing Systems

Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale integer linear programs, which rely on CG to solve LP relaxations within a branch and price algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a route, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG.