Neural Networks


Biomedical Image Segmentation: U-Net

#artificialintelligence

Image Classification helps us to classify what is contained in an image. The goal is to answer "is there a cat in this image?", Object Detection specifies the location of objects in the image. The goal is to identify "where is the cat in this image?", Image Segmentation creates a pixel-wise mask of each object in the images.


Baidu Releases PaddlePaddle Upgrades

#artificialintelligence

An updated release of Baidu's deep learning framework includes a batch of new features ranging from inference capabilities for Internet of Things (IoT) applications to a natural language processing (NLP) framework for Mandarin. The latest version of PaddlePaddle released this week includes a streamlined toolkit dubbed Paddle Lite 2.0 aimed at inference for IoT, embedded and mobile devices. It works with PaddlePaddle as well as pre-trained models from other sources, Chinese Internet giant (NASDAQ: BIDU) said. Along with faster deployment of ResNet-50, used for image classification on convolutional neural networks, Paddle Lite 2.0 also supports edge-based FPGAs and other hardware. New development kits include ERNIE 2.0, and updated version of Baidu's natural language processing framework.


Learning to Reason with Third Order Tensor Products

Neural Information Processing Systems

We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data. This improves symbolic interpretation and systematic generalisation. Our architecture is trained end-to-end through gradient descent on a variety of simple natural language reasoning tasks, significantly outperforming the latest state-of-the-art models in single-task and all-tasks settings. We also augment a subset of the data such that training and test data exhibit large systematic differences and show that our approach generalises better than the previous state-of-the-art. Papers published at the Neural Information Processing Systems Conference.


Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

Neural Information Processing Systems

Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements. In this paper, we propose a new joint learning algorithm, called Post, that integrates cross-entropy minimization and proximal policy optimization to achieve theoretically guaranteed optimal efficiency.


Learning Robust Global Representations by Penalizing Local Predictive Power

Neural Information Processing Systems

Despite their renowned in-domain predictive power, convolutional neural networks are known to rely more on high-frequency patterns that humans deem superficial than on low-frequency patterns that agree better with intuitions about what constitutes category membership. This paper proposes a method for training robust convolutional networks by penalizing the predictive power of the local representations learned by earlier layers. Intuitively, our networks are forced to discard predictive signals such as color and texture that can be gleaned from local receptive fields and to rely instead on the global structures of the image. Across a battery of synthetic and benchmark domain adaptation tasks, our method confers improved generalization out of the domain. Additionally, to evaluate cross-domain transfer, we introduce ImageNet-Sketch, a new dataset consisting of sketch-like images that matches the ImageNet classification validation set in scale and dimension.


Adaptive dropout for training deep neural networks

Neural Information Processing Systems

Recently, it was shown that by dropping out hidden activities with a probability of 0.5, deep neural networks can perform very well. We describe a model in which a binary belief network is overlaid on a neural network and is used to decrease the information content of its hidden units by selectively setting activities to zero. This ''dropout network can be trained jointly with the neural network by approximately computing local expectations of binary dropout variables, computing derivatives using back-propagation, and using stochastic gradient descent. Interestingly, experiments show that the learnt dropout network parameters recapitulate the neural network parameters, suggesting that a good dropout network regularizes activities according to magnitude. When evaluated on the MNIST and NORB datasets, we found our method can be used to achieve lower classification error rates than other feather learning methods, including standard dropout, denoising auto-encoders, and restricted Boltzmann machines.


Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

Neural Information Processing Systems

We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games. In this class, players control the inputs of a smooth function whose output is being applied to a bilinear zero-sum game. This class of games is motivated by the indirect nature of the competition in Generative Adversarial Networks, where players control the parameters of a neural network while the actual competition happens between the distributions that the generator and discriminator capture. We establish theoretically, that depending on the specific instance of the problem gradient-descent-ascent dynamics can exhibit a variety of behaviors antithetical to convergence to the game theoretically meaningful min-max solution. Specifically, different forms of recurrent behavior (including periodicity and Poincar\'{e} recurrence) are possible as well as convergence to spurious (non-min-max) equilibria for a positive measure of initial conditions.


Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Neural Information Processing Systems

Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. Yet, in the foundational PAC learning language, what concept class can it learn? Moreover, how can the same recurrent unit simultaneously learn functions from different input tokens to different output tokens, without affecting each other? In this paper, we show using the vanilla stochastic gradient descent (SGD), RNN can actually learn some notable concept class \emph{efficiently}, meaning that both time and sample complexity scale \emph{polynomially} in the input length (or almost polynomially, depending on the concept). This concept class at least includes functions where each output token is generated from inputs of earlier tokens using a smooth two-layer neural network.


On the Expressive Power of Deep Polynomial Neural Networks

Neural Information Processing Systems

We study deep neural networks with polynomial activations, particularly their expressive power. For a fixed architecture and activation degree, a polynomial neural network defines an algebraic map from weights to polynomials. The image of this map is the functional space associated to the network, and it is an irreducible algebraic variety upon taking closure. This paper proposes the dimension of this variety as a precise measure of the expressive power of polynomial neural networks. We obtain several theoretical results regarding this dimension as a function of architecture, including an exact formula for high activation degrees, as well as upper and lower bounds on layer widths in order for deep polynomials networks to fill the ambient functional space.


Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.

Neural Information Processing Systems

Learning representations that accurately capture long-range dependencies in sequential inputs --- including text, audio, and genomic data --- is a key problem in deep learning. Feed-forward convolutional models capture only feature interactions within finite receptive fields while recurrent architectures can be slow and difficult to train due to vanishing gradients. Here, we propose Temporal Feature-Wise Linear Modulation (TFiLM) --- a novel architectural component inspired by adaptive batch normalization and its extensions --- that uses a recurrent neural network to alter the activations of a convolutional model. This approach expands the receptive field of convolutional sequence models with minimal computational overhead. Empirically, we find that TFiLM significantly improves the learning speed and accuracy of feed-forward neural networks on a range of generative and discriminative learning tasks, including text classification and audio super-resolution.