Goto

Collaborating Authors

Noisy Recurrent Neural Networks

Neural Information Processing Systems

We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regularizer in the small noise regime. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Sufficient conditions for global stability are obtained, highlighting the phenomenon of stochastic stabilization, where noise injection can improve stability during training. Our theory is supported by empirical results which demonstrate that the RNNs have improved robustness with respect to various input perturbations.


Accelerated Linearized Laplace Approximation for Bayesian Deep Learning

Neural Information Processing Systems

Laplace approximation (LA) and its linearized variant (LLA) enable effortless adaptation of pretrained deep neural networks to Bayesian neural networks. The generalized Gauss-Newton (GGN) approximation is typically introduced to improve their tractability. However, LA and LLA are still confronted with non-trivial inefficiency issues and should rely on Kronecker-factored, diagonal, or even lastlayer approximate GGN matrices in practical use. These approximations are likely to harm the fidelity of learning outcomes. To tackle this issue, inspired by the connections between LLA and neural tangent kernels (NTKs), we develop a Nystrรถm approximation to NTKs to accelerate LLA.


Supplementary Material for Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification A Preliminaries of Conformal Prediction

Neural Information Processing Systems

Here, we provide a basic idea of conformal prediction to help understanding. To this end, we introduce the following example in [1]. To avoid this, Split Conformal Regression (SCR) is introduced which separates the samples for training and computing the residuals. Since the samples are i.i.d., for a new sample with each potential outcome, the corresponding confidence interval satisfies the miscoverage rate 1 ฮฑ from Theorem A.1 as P[Y (1) ฤˆ Subgroup analysis methods with recursive partitioning have been widely studied based on regression trees (RT) [2-5]. In these methods, the subgroups (i.e., leaves in the tree structure) are constructed; the treatment effects are estimated by the corresponding sample mean estimator on the leaf of the given covariates. To represent the non-linearity such as interactions between treatment and covariates [6], a parametric model is integrated into regression trees for subgroup analysis [7].


Supplementary: Coupled Segmentation and Edge Learning via Dynamic Graph Propagation Rui Huang

Neural Information Processing Systems

Due to space limit in the main paper, we describe some additional implementation details here. Our experiment with ResNet-38 on the Cityscapes Test set involves pre-training on Mapillary Vistas. Since Mapillary Vistas in general contains larger images than Cityscapes, we adjust the scale factor to be within [0.5, 1.5] instead of [0.5, 2.0]. The total number of training iterations is set to 500K. For Cityscapes test evaluation, we include the validation set data in model training following previous works.


Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

Neural Information Processing Systems

Image segmentation and edge detection are both central problems in perceptual grouping. It is therefore interesting to study how these two tasks can be coupled to benefit each other. Indeed, segmentation can be easily transformed into contour edges to guide edge learning. However, the converse is nontrivial since general edges may not always form closed contours. In this paper, we propose a principled end-to-end framework for coupled edge and segmentation learning, where edges are leveraged as pairwise similarity cues to guide segmentation.


Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits 1

Neural Information Processing Systems

Probabilistic integral circuits (PICs) have been recently introduced as probabilistic models enjoying the key ingredient behind expressive generative models: continuous latent variables (LVs). PICs are symbolic computational graphs defining continuous LV models as hierarchies of functions that are summed and multiplied together, or integrated over some LVs. They are tractable if LVs can be analytically integrated out, otherwise they can be approximated by tractable probabilistic circuits (PC) encoding a hierarchical numerical quadrature process, called QPCs. So far, only tree-shaped PICs have been explored, and training them via numerical quadrature requires memory-intensive processing at scale. In this paper, we address these issues, and present: (i) a pipeline for building DAG-shaped PICs out of arbitrary variable decompositions, (ii) a procedure for training PICs using tensorized circuit architectures, and (iii) neural functional sharing techniques to allow scalable training.




Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Neural Information Processing Systems

Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen). Given an egocentric RGB-D camera and a high-level action space, the agent is rewarded for maximizing successful interactions while simultaneously training an image-based affordance segmentation model. The former yields a policy for acting efficiently in new environments to prepare for downstream interaction tasks, while the latter yields a convolutional neural network that maps image regions to the likelihood they permit each action, densifying the rewards for exploration. We demonstrate our idea with AI2-iTHOR. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like "find a knife and put it in the drawer."


15825aee15eb335cc13f9b559f166ee8-AuthorFeedback.pdf

Neural Information Processing Systems

Eqn 1: Reward depends on history of the state. The recurrent policy network encodes the agent's observation history over time to arrive at a state-representation. Novelty rewards for visual exploration for mapping [57,51,7] are formulated similarly with RNNs. Approach tries every single object. Taking a knife/apple is the same...# affordances is very low. This number is defined by the AI2-iTHOR environments and is in no way limited by our approach.