Goto

Collaborating Authors

 Deep Learning


5 Machine Learning Projects You Can No Longer Overlook, May

@machinelearnbot

More overlooked machine learning and/or machine learning-related projects? OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. From Intel comes a(nother) deep learning framework, optimized for distribution over Apache Spark. BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.


Recycling Deep Learning Models with Transfer Learning

@machinelearnbot

Deep learning models are indisputably the state of the art for many problems in machine perception. Using neural networks with many hidden layers of artificial neurons and millions of parameters, deep learning algorithms exploit both hardware capabilities and the abundance of gigantic datasets. In comparison, linear models saturate quickly and under-fit. But what if you don't have big data? Say you have a novel research problem, such as identifying cancerous moles given photographs.


First Artificial Intelligence Visual Search For UK Arts Sector

#artificialintelligence

A ground-breaking artificial intelligence (AI) visual search tool – the first within the UK arts space – has been launched by AI specialist Visii for digital image licensing platform Artimage. Disrupting keyword search, Visii's intelligent visual search tool gives users the power to discover artworks they'll love based on their visual preferences. Users can focus on a subject, an artwork's composition or an artist's style, without needing to describe it or know of its existence; meaning discovery of hidden gems now becomes possible and easy. Visii uses advanced artificial intelligence methods to infer users' interests and generate visually relevant results. Its deep learning technology analyses thousands of visual descriptors within each image, the comparison of which is incredibly fast, returning results from large catalogues in less than 350 milliseconds.


On-the-fly Operation Batching in Dynamic Computation Graphs

arXiv.org Machine Learning

Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, using computationally efficient batched operations. On a variety of tasks, we obtain throughput similar to that obtained with manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually.


Concrete Dropout

arXiv.org Machine Learning

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary-- a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.


Regularizing deep networks using efficient layerwise adversarial training

arXiv.org Machine Learning

Adversarial training has been shown to regularize deep neural networks in addition to increasing their robustness to adversarial examples. However, its impact on very deep state of the art networks has not been fully investigated. In this paper, we present an efficient approach to perform adversarial training by perturbing intermediate layer activations and study the use of such perturbations as a regularizer during training. We use these perturbations to train very deep models such as ResNets and show improvement in performance both on adversarial and original test data. Our experiments highlight the benefits of perturbing intermediate layer activations compared to perturbing only the inputs. The results on CIFAR-10 and CIFAR-100 datasets show the merits of the proposed adversarial training approach. Additional results on WideResNets show that our approach provides significant improvement in classification accuracy for a given base model, outperforming dropout and other base models of larger size.


Approximate Inference with Amortised MCMC

arXiv.org Machine Learning

We propose a novel approximate inference framework that approximates a target distribution by amortising the dynamics of a user-selected Markov chain Monte Carlo (MCMC) sampler. The idea is to initialise MCMC using samples from an approximation network, apply the MCMC operator to improve these samples, and finally use the samples to update the approximation network thereby improving its quality. This provides a new generic framework for approximate inference, allowing us to deploy highly complex, or implicitly defined approximation families with intractable densities, including approximations produced by warping a source of randomness through a deep neural network. Experiments consider Bayesian neural network classification and image modelling with deep generative models. Deep models trained using amortised MCMC are shown to generate realistic looking samples as well as producing diverse imputations for images with regions of missing pixels.


Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

arXiv.org Machine Learning

In a previous work we have detailed the requirements to obtain a maximal performance benefit by implementing fully connected deep neural networks (DNN) in form of arrays of resistive devices for deep learning. This concept of Resistive Processing Unit (RPU) devices we extend here towards convolutional neural networks (CNNs). We show how to map the convolutional layers to RPU arrays such that the parallelism of the hardware can be fully utilized in all three cycles of the backpropagation algorithm. We find that the noise and bound limitations imposed due to analog nature of the computations performed on the arrays effect the training accuracy of the CNNs. Noise and bound management techniques are presented that mitigate these problems without introducing any additional complexity in the analog circuits and can be addressed by the digital circuits. In addition, we discuss digitally programmable update management and device variability reduction techniques that can be used selectively for some of the layers in a CNN. We show that combination of all those techniques enables a successful application of the RPU concept for training CNNs. The techniques discussed here are more general and can be applied beyond CNN architectures and therefore enables applicability of RPU approach for large class of neural network architectures.


Solving ill-posed inverse problems using iterative deep neural networks

arXiv.org Artificial Intelligence

We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the "gradient" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.


Deep adversarial learning is finally ready and will radically change the game

#artificialintelligence

Adversarial learning allows us to free our models of any constraints or limitations in our understanding of the problem domain -- there is no preconception of what to learn and the model is free to explore the data. In the next post we will see how we can utilize the representations learned by our generator for image classification.