# Neural Networks

### Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient.

### Large Scale Structure of Neural Network Loss Landscapes

There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High dimensionality plays a key role in our model. Our core idea is to model the loss landscape as a set of high dimensional \emph{wedges} that together form a large-scale, inter-connected structure and towards which optimization is drawn. Finally, we predict and demonstrate new counter-intuitive properties of the loss-landscape.

### Generative Probabilistic Novelty Detection with Adversarial Autoencoders

Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier. We assume that training data is available to describe only the inlier distribution. Recent approaches primarily leverage deep encoder-decoder network architectures to compute a reconstruction error that is used to either compute a novelty score or to train a one-class classifier. While we too leverage a novel network of that kind, we take a probabilistic approach and effectively compute how likely it is that a sample was generated by the inlier distribution. We achieve this with two main contributions.

### Population Matching Discrepancy and Applications in Deep Learning

A differentiable estimation of the distance between two distributions based on samples is important for many deep learning tasks. One such estimation is maximum mean discrepancy (MMD). However, MMD suffers from its sensitive kernel bandwidth hyper-parameter, weak gradients, and large mini-batch size when used as a training objective. In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective. PMD is defined as the minimum weight matching of sample populations from each distribution, and we prove that PMD is a strongly consistent estimator of the first Wasserstein metric.

### On the Convergence Rate of Training Recurrent Neural Networks

How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures? Most existing theory only covers networks with one hidden layer, so can we go deeper? In this paper, we focus on recurrent neural networks (RNNs) which are multi-layer networks widely used in natural language processing. They are harder to analyze than feedforward neural networks, because the \emph{same} recurrent unit is repeatedly applied across the entire time horizon of length $L$, which is analogous to feedforward networks of depth $L$.

### M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

Learning to walk over a graph towards a target node for a given query and a source node is an important problem in applications such as knowledge base completion (KBC). It can be formulated as a reinforcement learning (RL) problem with a known state transition model. To overcome the challenge of sparse rewards, we develop a graph-walking agent called M-Walk, which consists of a deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN encodes the state (i.e., history of the walked path) and maps it separately to a policy and Q-values. In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.

### Fast AutoAugment

Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment \cite{cubuk2018autoaugment} has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet.

### Spectrally-normalized margin bounds for neural networks

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized "spectral complexity": their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the MNIST and CIFAR10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity. Papers published at the Neural Information Processing Systems Conference.

### The Expressive Power of Neural Networks: A View from the Width

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. We show a universal approximation theorem for width-bounded ReLU networks: width-(n 4) ReLU networks, where n is the input dimension, are universal approximators.

### Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

A key requirement in sequence to sequence processing is the modeling of long range dependencies. To this end, a vast majority of the state-of-the-art models use attention mechanism which is of O(n 2) complexity that leads to slow execution for long sequences. We introduce a new Shuffle-Exchange neural network model for sequence to sequence tasks which have O(log n) depth and O(n log n) total complexity. We show that this model is powerful enough to infer efficient algorithms for common algorithmic benchmarks including sorting, addition and multiplication. We evaluate our architecture on the challenging LAMBADA question answering dataset and compare it with the state-of-the-art models which use attention.