Deep Learning
Artificial intelligence is a part of your life
Artificial Intelligence has emerged from the realm of science fiction and become a part of the millennial way of life. The factory of the future is taking shape in Japan and it is being built on an AI platform. Here, robots will learn from each other and every robot will have an embedded GPU to perform real time AI. In 2016, AlphaGo, a computer programme from Google's Deep Mind, had a historic win over the world's most celebrated Go champion, Lee Sedol of South Korea. The same year, Microsoft achieved human parity in speech recognition.
Google forms Montreal AI research group, gives $3.37 million grant to Yoshua Bengio, others
Google is announcing today that it's setting up a deep learning and artificial intelligence (AI) research unit in its office in Montreal and giving $3.37 million in grant money to deep learning luminary Yoshua Bengio and seven other people associated with the Montreal Institute for Learning Algorithms (MILA). Bengio himself has previously received backing from Google, and from other companies as well -- namely, IBM, Samsung, and Intel. But the new grant is "bigger than any of the other funding we've received from private companies up until now," he said during an interview with VentureBeat. Bengio will not be formally allying himself with Google proper, because he wants to stay independent. "That's who I am," he said, "that's the choice I made that fits with my values, and I don't need to get the millions, I'm fine. My salary is very good, and I care more about how what I can do could have a positive impact for science, humanity, and for training the next generation [of researchers]."
Compressing and regularizing deep neural networks
Deep neural networks have evolved to be the state-of-the-art technique for machine learning tasks ranging from computer vision and speech recognition to natural language processing. However, deep learning algorithms are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, deep compression significantly reduces the computation and storage required by neural networks. For example, for a convolutional neural network with fully connected layers, such as Alexnet and VGGnet, it can reduce the model size by 35x-49x. Even for fully convolutional neural networks such as GoogleNet and SqueezeNet, deep compression can still reduce the model size by 10x.
Google opens new AI lab and invests $3.4M in Montreal-based AI research
Google has invested a total of $4.5 million CAD ($3.4M US) in AI research in Montreal's Institute for Learning Algorithms, with an academic fund covering three years that will help pay for seven faculty members across various Montreal academic institutions, including the University of Montreal and McGill. The investment is also continued backing for deep learning expert Yoshua Bengio's work, and is part of Google's continued bet on Canada's strong expertise in machine learning and AI research, both of which are becoming increasingly important to its core business. To that end, along with the investment, Google is also opening a brand new deep learning and AI research group in Montreal at its existing office in the city. The new team will be a remote arm of its Google Brain team based in Mountain View, and will be led locally by Hugo Larochelle, a deep learning expert who's returning home to Montreal from a role with Twitter in Boston specifically for the new position. Google notes that its total investment in academic research in Canada to date now amounts to around $13 million Canadian over the past 10 years, and it hopes that the new investment will help with the ongoing formation of an AI supercluster in Montreal, which is becoming a hotbed for AI startups as well as academic research.
Interpretation of Prediction Models Using the Input Gradient
State of the art machine learning algorithms are highly optimized to provide the optimal prediction possible, naturally resulting in complex models. While these models often outperform simpler more interpretable models by order of magnitudes, in terms of understanding the way the model functions, we are often facing a "black box". In this paper we suggest a simple method to interpret the behavior of any predictive model, both for regression and classification. Given a particular model, the information required to interpret it can be obtained by studying the partial derivatives of the model with respect to the input.
Feature Importance Measure for Non-linear Learning Algorithms
Vidovic, Marina M. -C., Gรถrnitz, Nico, Mรผller, Klaus-Robert, Kloft, Marius
Complex problems may require sophisticated, non-linear learning methods such as kernel machines or deep neural networks to achieve state of the art prediction accuracies. However, high prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. Unfortunately, most methods do not come with out of the box straight forward interpretation. Even linear prediction functions are not straight forward to explain if features exhibit complex correlation structure. In this paper, we propose the Measure of Feature Importance (MFI). MFI is general and can be applied to any arbitrary learning machine (including kernel machines and deep learning). MFI is intrinsically non-linear and can detect features that by itself are inconspicuous and only impact the prediction function through their interaction with other features. Lastly, MFI can be used for both --- model-based feature importance and instance-based feature importance (i.e, measuring the importance of a feature for a particular data point).
Inducing Interpretable Representations with Variational Autoencoders
Siddharth, N., Paige, Brooks, Desmaison, Alban, Van de Meent, Jan-Willem, Wood, Frank, Goodman, Noah D., Kohli, Pushmeet, Torr, Philip H. S.
We develop a framework for incorporating structured graphical models in the \emph{encoders} of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high-dimensional domains where it is often difficult to model all the variation. Learning in this framework is carried out end-to-end with a variational objective, applying to both unsupervised and semi-supervised schemes.
TreeView: Peeking into Deep Neural Networks Via Feature-Space Partitioning
Thiagarajan, Jayaraman J., Kailkhura, Bhavya, Sattigeri, Prasanna, Ramamurthy, Karthikeyan Natesan
With the advent of highly predictive but opaque deep learning models, it has become more important than ever to understand and explain the predictions of such models. Existing approaches define interpretability as the inverse of complexity and achieve interpretability at the cost of accuracy. This introduces a risk of producing interpretable but misleading explanations. As humans, we are prone to engage in this kind of behavior \cite{mythos}. In this paper, we take a step in the direction of tackling the problem of interpretability without compromising the model accuracy. We propose to build a Treeview representation of the complex model via hierarchical partitioning of the feature space, which reveals the iterative rejection of unlikely class labels until the correct association is predicted.
Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery
Wisdom, Scott, Powers, Thomas, Pitton, James, Atlas, Les
Recurrent neural networks (RNNs) are powerful and effective for processing sequential data. However, RNNs are usually considered "black box" models whose internal structure and learned parameters are not interpretable. In this paper, we propose an interpretable RNN based on the sequential iterative soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery problem, which models a sequence of correlated observations with a sequence of sparse latent vectors. The architecture of the resulting SISTA-RNN is implicitly defined by the computational structure of SISTA, which results in a novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are perfectly interpretable as the parameters of a principled statistical model, which in this case include a sparsifying dictionary, iterative step size, and regularization parameters. In addition, on a particular sequential compressive sensing task, the SISTA-RNN trains faster and achieves better performance than conventional state-of-the-art black box RNNs, including long-short term memory (LSTM) RNNs.
Evolutionary Synthesis of Deep Neural Networks via Synaptic Cluster-driven Genetic Encoding
Shafiee, Mohammad Javad, Wong, Alexander
There has been significant recent interest towards achieving highly efficient deep neural network architectures. A promising paradigm for achieving this is the concept of evolutionary deep intelligence, which attempts to mimic biological evolution processes to synthesize highly-efficient deep neural networks over successive generations. An important aspect of evolutionary deep intelligence is the genetic encoding scheme used to mimic heredity, which can have a significant impact on the quality of offspring deep neural networks. Motivated by the neurobiological phenomenon of synaptic clustering, we introduce a new genetic encoding scheme where synaptic probability is driven towards the formation of a highly sparse set of synaptic clusters. Experimental results for the task of image classification demonstrated that the synthesized offspring networks using this synaptic cluster-driven genetic encoding scheme can achieve state-of-the-art performance while having network architectures that are not only significantly more efficient (with a ~125-fold decrease in synapses for MNIST) compared to the original ancestor network, but also tailored for GPU-accelerated machine learning applications.