Information Technology
Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties Bosch Center for Artificial Intelligence, Renningen, Germany
Deep Gaussian Processes learn probabilistic data representations for supervised learning by cascading multiple Gaussian Processes. While this model family promises flexible predictive distributions, exact inference is not tractable. Approximate inference techniques trade off the ability to closely resemble the posterior distribution against speed of convergence and computational efficiency. We propose a novel Gaussian variational family that allows for retaining covariances between latent processes while achieving fast convergence by marginalising out all global latent variables. After providing a proof of how this marginalisation can be done for general covariances, we restrict them to the ones we empirically found to be most important in order to also achieve computational efficiency. We provide an efficient implementation of our new approach and apply it to several benchmark datasets. It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties Bosch Center for Artificial Intelligence, Renningen, Germany
Deep Gaussian Processes learn probabilistic data representations for supervised learning by cascading multiple Gaussian Processes. While this model family promises flexible predictive distributions, exact inference is not tractable. Approximate inference techniques trade off the ability to closely resemble the posterior distribution against speed of convergence and computational efficiency. We propose a novel Gaussian variational family that allows for retaining covariances between latent processes while achieving fast convergence by marginalising out all global latent variables. After providing a proof of how this marginalisation can be done for general covariances, we restrict them to the ones we empirically found to be most important in order to also achieve computational efficiency. We provide an efficient implementation of our new approach and apply it to several benchmark datasets. It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
60a70bb05b08d6cd95deb3bdb750dce8-AuthorFeedback.pdf
We thank all reviewers for their careful reading and their detailed and constructive comments. We first address the shared reviewer comments and then individual ones. On 5/8 datasets STAR DGP significantly outperforms MF DGP (µ > 0.50 + σ), while the opposite only As suggested by R2, we also compared MF to FC DGP leading to similar results (see new table). Train-test split (R2) We are the first to study the extrapolation behaviour of DGPs. S2 and will move it to the main paper to facilitate comparison to related work.
Hierarchical Decision Making by Generating and Following Natural Language Instructions
Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis
We explore using natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a large number of units must be coordinated across long time scales. We gather a dataset of 76 thousand pairs of instructions and executions from human play, and train instructor and executor models. Experiments show that models generate intermediate plans in natural langauge significantly outperform models that directly imitate human actions. The compositional structure of language is conducive to learning generalizable action representations.
7967cc8e3ab559e68cc944c44b1cf3e8-AuthorFeedback.pdf
We would like to thank all the reviewers for their insightful and constructive feedback. This model can get comparable win rate to the RNN-Discriminative in Table3. Finally, we appreciate the reviewers for suggesting additional citations and interesting future directions. Natural language has several advantages over latent programs. Secondly, gathering supervision for natural language actions is possible with the framework we introduce.
Empowering Convolutional Neural Networks with MetaSin Activation Yuxuan Wang
As an alternative, sin networks showed promising results in learning implicit representations of visual data. However training these networks in practically relevant settings proved to be difficult, requiring careful initialization, dealing with issues due to inconsistent gradients, and a degeneracy in local minima. In this work, we instead propose replacing a baseline network's existing activations with a novel ensemble function with trainable parameters.
Faster width-dependent algorithm for mixed packing and covering LPs
Digvijay Boob, Saurabh Sawlani, Di Wang
In this paper, we give a faster width-dependent algorithm for mixed packingcovering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a 1 ` ε approximate solution in time OpNw{εq, where N is number of nonzero entries in the constraint matrix, and w is the maximum number of nonzeros in any constraint. This algorithm is faster than Nesterov's smoothing algorithm which requires OpN? nw{εq time, where n is the dimension of the problem. Our work utilizes the framework of area convexity introduced in [Sherman-FOCS'17] to obtain the best dependence on ε while breaking the infamous l
Targeted Adversarial Perturbations for Monocular Depth Prediction SUPPLEMENTARY MATERIALS
In Sec. 2, the robustness of perturbations against defenses is discussed. Additional implementation details that we could not fit into main text due to space constraints are given in Sec. 3. We verify our claim that targeted adversarial perturbations are visually imperceptible in Sec. 4. More experimental results on changing the scale of the scene are provided in Sec. 5. In Sec. 6, existence of the successful adversarial attacks for indoor scenes (NYU-V2) is shown for state-of-the-art indoor monocular depth prediction model. In Sec. 7, we examine how predictions behave when linear operations are applied to perturbations (sum of two perturbations and linear scaling of a perturbation). Failure cases for the perturbations are analyzed in Sec. 8. Finally, in Sec. 9, more qualitative and quantitative results are provided for the experiments whose compressed versions are presented in the main text.
ball go beyond the fact that strongly convex functions grow too fast (Lemma 3.1): there are provable oracle complexity
Dear reviewers, we greatly appreciate your remarks and suggestions. We will address the comments in the following. I don't think this is the case, since the primal-dual Could the authors clarify this? We will correct this accordingly. If we have an ɛ-optimal solution of Eq(2) (i.e., Definition 4.1), we can read from it a solution (x, y, z) whose Shouldn't the supremum be over w, instead of x? Page 7, line 253.