Deep Learning
Implementation of a Practical Distributed Calculation System with Browsers and JavaScript, and Application to Distributed Deep Learning
Deep learning can achieve outstanding results in various fields. However, it requires so significant computational power that graphics processing units (GPUs) and/or numerous computers are often required for the practical application. We have developed a new distributed calculation framework called "Sashimi" that allows any computer to be used as a distribution node only by accessing a website. We have also developed a new JavaScript neural network framework called "Sukiyaki" that uses general purpose GPUs with web browsers. Sukiyaki performs 30 times faster than a conventional JavaScript library for deep convolutional neural networks (deep CNNs) learning. The combination of Sashimi and Sukiyaki, as well as new distribution algorithms, demonstrates the distributed deep learning of deep CNNs only with web browsers on various devices. The libraries that comprise the proposed methods are available under MIT license at http://mil-tokyo.github.io/.
Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014
Doroshin, Danila, Yamshinin, Alexander, Lubimov, Nikolay, Nastasenko, Marina, Kotov, Mikhail, Tkachenko, Maxim
This paper presents a novel approach to speaker subspace modelling based on Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is based on the idea of shared factors as in the Probabilistic Linear Discriminant Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors, herein the speaker factor is shared over all vectors of the speaker. Then Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced. Various new scoring techniques for speaker verification using GRBM are proposed. The results for NIST i-vector Challenge 2014 dataset are presented.
Geometry and Expressive Power of Conditional Restricted Boltzmann Machines
Montufar, Guido, Ay, Nihat, Ghazi-Zahedi, Keyan
Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
Ren, Jimmy SJ. (Lenovo Research and Technology) | Xu, Li (Lenovo Research and Technology)
We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep network architectures, the vector processing hardware (e.g. GPU) undisputedly plays a vital role in modern CNN implementations to support massive computation. Though much attention was paid in the extent literature to understand the algorithmic side of deep CNN, little research was dedicated to the vectorization for scaling up CNNs. In this paper, we studied the vectorization process of key building blocks in deep CNNs, in order to better understand and facilitate parallel implementation. Key steps in training and testing deep CNNs are abstracted as matrix and vector operators, upon which parallelism can be easily achieved. We developed and compared six implementations with various degrees of vectorization with which we illustrated the impact of vectorization on the speed of model training and testing. Besides, a unified CNN framework for both high-level and low-level vision tasks is provided, along with a vectorized Matlab implementation with state-of-the-art speed performance.
Relational Stacked Denoising Autoencoder for Tag Recommendation
Wang, Hao (Hong Kong University of Science and Technology) | Shi, Xingjian (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology)
Tag recommendation has become one of the most important ways of organizing and indexing online resources like articles, movies, and music. Since tagging information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate tag recommendation. Recently, models proposed for tag recommendation, such as collaborative topic regression and its variants, have demonstrated promising accuracy. However, a limitation of these models is that, by using topic models like latent Dirichlet allocation as the key component, the learned representation may not be compact and effective enough. Moreover, since relational data exist as an auxiliary data source in many applications, it is desirable to incorporate such data into tag recommendation models. In this paper, we start with a deep learning model called stacked denoising autoencoder (SDAE) in an attempt to learn more effective content representation. We propose a probabilistic formulation for SDAE and then extend it to a relational SDAE (RSDAE) model. RSDAE jointly performs deep representation learning and relational learning in a principled way under a probabilistic framework. Experiments conducted on three real datasets show that both learning more effective representation and learning from relational data are beneficial steps to take to advance the state of the art.
Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network
Ding, Xiao (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Duan, Junwen (Harbin Institute of Technology) | Nie, Jian-Yun (University of Montreal)
Social media platforms are often used by people to express their needs and desires. Such data offer great opportunities to identify usersโ consumption intention from user-generated contents, so that better tailored products or services can be recommended. However, there have been few efforts on mining commercial intents from social media contents. In this paper, we investigate the use of social media data to identify consumption intentions for individuals. We develop a Consumption Intention Mining Model (CIMM) based on convolutional neural network (CNN), for identifying whether the user has a consumption intention. The task is domain-dependent, and learning CNN requires a large number of annotated instances, which can be available only in some domains. Hence, we investigate the possibility of transferring the CNN mid-level sentence representation learned from one domain to another by adding an adaptation layer. To demonstrate the effectiveness of CIMM, we conduct experiments on two domains. Our results show that CIMM offers a powerful paradigm for effectively identifying usersโ consumption intention based on their social media data. Moreover, our results also confirm that the CNN learned in one domain can be effectively transferred to another domain. This suggests that a great potential for our model to significantly increase effectiveness of product recommendations and targeted advertising.
Automated Construction of Visual-Linguistic Knowledge via Concept Learning from Cartoon Videos
Ha, Jung-Woo (Seoul National University) | Kim, Kyung-Min (Seoul National University) | Zhang, Byoung-Tak (Seoul National University)
Learning mutually-grounded vision-language knowledge is a foundational task for cognitive systems and human-level artificial intelligence. Most of knowledge-learning techniques are focused on single modal representations in a static environment with a fixed set of data. Here, we explore an ecologically more-plausible setting by using a stream of cartoon videos to build vision-language concept hierarchies continuously. This approach is motivated by the literature on cognitive development in early childhood. We present the model of deep concept hierarchy (DCH) that enables the progressive abstraction of concept knowledge in multiple levels. We develop a stochastic method for graph construction, i.e. a graph Monte Carlo algorithm, to search efficiently the huge compositional space of the vision-language concepts. The concept hierarchies are built incrementally and can handle concept drift, allowing for being deployed in lifelong learning environments. Using a series of approximately 200 episodes of educational cartoon videos we demonstrate the emergence and evolution of the concept hierarchies as the video stories unfold. We also present the application of the deep concept hierarchies for context-dependent translation between vision and language, i.e. the transcription of a visual scene into text and the generation of visual imagery from text.
Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks
Balduzzi, David (Victoria University of Wellington) | Vanchinathan, Hastagiri (ETH Zurich) | Buhmann, Joachim (ETH Zurich)
Error backpropagation is an extremely effective algorithm for assigning credit in artificial neural networks. However, weight updates under Backprop depend on lengthy recursive computations and require separate output and error messages โ features not shared by biological neurons, that are perhaps unnecessary. In this paper, we revisit Backprop and the credit assignment problem. We first decompose Backprop into a collection of interacting learning algorithms; provide regret bounds on the performance of these sub-algorithms; and factorize Backprop's error signals. Using these results, we derive a new credit assignment algorithm for nonparametric regression, Kickback, that is significantly simpler than Backprop. Finally, we provide a sufficient condition for Kickback to follow error gradients, and show that Kickback matches Backprop's performance on real-world regression benchmarks.
DeepTutor: An Effective, Online Intelligent Tutoring System That Promotes Deep Learning
Rus, Vasile (The University of Memphis) | Niraula, Nobal (The University of Memphis) | Banjade, Rajendra (The University of Memphis)
We present in this paper an innovative solution to the challenge of building effective educational technologies that offer tailored instruction to each individual learner. The proposed solution in the form of a conversational intelligent tutoring system, called DeepTutor, has been developed as a web application that is accessible 24/7 through a browser from any device connected to the Internet. The success of several large scale experiments with high-school students using DeepTutor is a solid proof that conversational intelligent tutoring at scale over the web is possible.
Social Hierarchical Learning
Hayes, Bradley (Yale University)
My dissertation research focuses on the application of hierarchical learning and heuristics based on social signals to solve challenges inherent to enabling human-robot collaboration. I approach this problem through advancing the state of the art in building hierarchical task representations, multi-agent task-level planning, and learning assistive behaviors from demonstration.