Deep Learning
'Almost Sure' Chaotic Properties of Machine Learning Methods
Mondal, Nabarun, Ghosh, Partha P.
It has been demonstrated earlier that universal computation is 'almost surely' chaotic. Machine learning is a form of computational fixed point iteration, iterating over the computable function space. We showcase some properties of this iteration, and establish in general that the iteration is 'almost surely' of chaotic nature. This theory explains the observation in the counter intuitive properties of deep learning methods. This paper demonstrates that these properties are going to be universal to any learning method.
Transformed Representations for Convolutional Neural Networks in Diabetic Retinopathy Screening
Lim, Gilbert (National University of Singapore) | Lee, Mong Li (National University of Singapore) | Hsu, Wynne (National University of Singapore) | Wong, Tien Yin (Singapore National Eye Centre)
Convolutional neural networks (CNNs) are flexible, biologically-inspired variants of multi-layer perceptrons that have proven themselves to be exceptionally suited to discriminative vision tasks. However, relatively little is known on whether they can be made both more efficient and more accurate, by introducing suitable transformations that exploit general knowledge of the target classes. We demonstrate this functionality through pre-segmentation of input images with a fast and robust but loose segmentation step, to obtain a set of candidate objects. These objects then undergo a spatial transformation into a reduced space, retaining but a compact high-level representation of their appearance. Additional attributes may be abstracted as raw features that are incorporated after the convolutional phase of the network. Finally, we compare its performance against existing approaches on the challenging problem of detecting lesions in retinal images.
Hybrid Heterogeneous Transfer Learning through Deep Learning
Zhou, Joey Tianyi (Nanyang Technological University) | Pan, Sinno Jialin (Institute for Infocomm Research) | Tsang, Ivor W. (University of Technology, Sydney) | Yan, Yan (University of Queensland)
Most previous heterogeneous transfer learning methods learn a cross-domain feature mapping between heterogeneous feature spaces based on a few cross-domain instance-correspondences, and these corresponding instances are assumed to be representative in the source and target domains respectively. However, in many real-world scenarios, this assumption may not hold. As a result, the constructed feature mapping may not be precisely due to the bias issue of the correspondences in the target or (and) source domain(s). In this case, a classifier trained on the labeled transformed-source-domain data may not be useful for the target domain. In this paper, we present a new transfer learning framework called Hybrid Heterogeneous Transfer Learning (HHTL), which allows the corresponding instances across domains to be biased in either the source or target domain. Specifically, we propose a deep learning approach to learn a feature mapping between cross-domain heterogeneous features as well as a better feature representation for mapped data to reduce the bias issue caused by the cross-domain correspondences. Extensive experiments on several multilingual sentiment classification tasks verify the effectiveness of our proposed approach compared with some baseline methods.
Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network
Luo, Yong (Peking University) | Tang, Jian (Peking University) | Yan, Jun (Microsoft Research Asia) | Xu, Chao (Peking University) | Chen, Zheng (Microsoft Research Asia)
Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.
Deep Modeling of Group Preferences for Group-Based Recommendation
Hu, Liang (Shanghai Jiaotong University) | Cao, Jian (Shanghai Jiaotong University) | Xu, Guandong (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Gu, Zhiping (Shanghai Technical Institute of Electronics &) | Cao, Wei (Information)
Nowadays, most recommender systems (RSs) mainly aim to suggest appropriate items for individuals. Due to the social nature of human beings, group activities have become an integral part of our daily life, thus motivating the study on group RS (GRS). However, most existing methods used by GRS make recommendations through aggregating individual ratings or individual predictive results rather than considering the collective features that govern user choices made within a group. As a result, such methods are heavily sensitive to data, hence they often fail to learn group preferences when the data are slightly inconsistent with predefined aggregation assumptions. To this end, we devise a novel GRS approach which accommodates both individual choices and group decisions in a joint model. More specifically, we propose a deep-architecture model built with collective deep belief networks and dual-wing restricted Boltzmann machines. With such a deep model, we can use high-level features, which are induced from lower-level features, to represent group preference so as to relieve the vulnerability of data. Finally, the experiments conducted on a real-world dataset prove the superiority of our deep model over other state-of-the-art methods.
Echo-State Conditional Restricted Boltzmann Machines
Chatzis, Sotirios (Cyprus University of Technology)
Restricted Boltzmann machines (RBMs) are a powerful generative modeling technique, based on a complex graphical model of hidden (latent) variables. Conditional RBMs (CRBMs) are an extension of RBMs tailored to modeling temporal data. A drawback of CRBMs is their consideration of linear temporal dependencies, which limits their capability to capture complex temporal structure. They also require many variables to model long temporal dependencies, a fact that might provoke overfitting proneness. To resolve these issues, in this paper we propose the echo-state CRBM (ES-CRBM): our model uses an echo-state network reservoir in the context of CRBMs to efficiently capture long and complex temporal dynamics, with much fewer trainable parameters compared to conventional CRBMs. In addition, we introduce an (implicit) mixture of ES-CRBM experts (im-ES-CRBM) to enhance even further the capabilities of our ES-CRBM model. The introduced im-ES-CRBM allows for better modeling temporal observations which might comprise a number of latent or observable subpatterns that alternate in a dynamic fashion. It also allows for performing sequence segmentation using our framework. We apply our methods to sequential data modeling and classification experiments using public datasets. As we show, our approach outperforms both existing RBM-based approaches as well as related state-of-the-art methods, such as conditional random fields.
Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks
Zhang, Yuyu (Chinese Academy of Sciences) | Dai, Hanjun (Fudan University) | Xu, Chang (Nankai University) | Feng, Jun (Tsinghua University) | Wang, Taifeng (Microsoft Research) | Bian, Jiang (Microsoft Research) | Wang, Bin (Chinese Academy of Sciences) | Liu, Tie-Yan (Microsoft Research)
Click prediction is one of the fundamental problems in sponsored search. Most of existing studies took advantage of machine learning approaches to predict ad click for each event of ad view independently. However, as observed in the real-world sponsored search system, user's behaviors on ads yield high dependency on how the user behaved along with the past time, especially in terms of what queries she submitted, what ads she clicked or ignored, and how long she spent on the landing pages of clicked ads, etc. Inspired by these observations, we introduce a novel framework based on Recurrent Neural Networks (RNN). Compared to traditional methods, this framework directly models the dependency on user's sequential behaviors into the click prediction process through the recurrent structure in RNN. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that our approach can significantly improve the click prediction accuracy, compared to sequence-independent approaches.
Learning Deep Representations for Graph Clustering
Tian, Fei (University of Science and Technology of China) | Gao, Bin (Microsoft Research) | Cui, Qing (Tsinghua University) | Chen, Enhong (University of Science and Technology of China) | Liu, Tie-Yan (Microsoft Research)
Recently deep learning has been successfully adopted in many applications such as speech recognition and image classification. In this work, we explore the possibility of employing deep learning in graph clustering. We propose a simple method, which first learns a nonlinear embedding of the original graph by stacked autoencoder, and then runs $k$-means algorithm on the embedding to obtain the clustering result. We show that this simple method has solid theoretical foundation, due to the similarity between autoencoder and spectral clustering in terms of what they actually optimize. Then, we demonstrate that the proposed method is more efficient and flexible than spectral clustering. First, the computational complexity of autoencoder is much lower than spectral clustering: the former can be linear to the number of nodes in a sparse graph while the latter is super quadratic due to eigenvalue decomposition. Second, when additional sparsity constraint is imposed, we can simply employ the sparse autoencoder developed in the literature of deep learning; however, it is non-straightforward to implement a sparse spectral method. The experimental results on various graph datasets show that the proposed method significantly outperforms conventional spectral clustering which clearly indicates the effectiveness of deep learning in graph clustering.
On the Challenges of Physical Implementations of RBMs
Dumoulin, Vincent (Universitรฉ de Montrรฉal) | Goodfellow, Ian J (Universitรฉ de Montrรฉal) | Courville, Aaron (Universitรฉ de Montrรฉal) | Bengio, Yoshua (Universitรฉ de Montrรฉal)
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the costof sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are based on the D-Wave Two computer, but the issues we investigate arise in most forms of physical computation.Our findings suggest that designers of new physical computing hardware and algorithms for physical computers should focus their efforts on overcoming the limitations imposed by the topology restrictions of currently existing physical computers.
Efficient Codes for Inverse Dynamics During Walking
Johnson, Leif (The University of Texas at Austin) | Ballard, Dana H (The University of Texas at Austin)
Efficient codes have been used effectively in both computer science and neuroscience to better understand the information processing in visual and auditory encoding and discrimination tasks. In this paper, we explore the use of efficient codes for representing information relevant to human movements during locomotion. Specifically, we apply motion capture data to a physical model of the human skeleton to compute joint angles (inverse kinematics) and joint torques (inverse dynamics); then, by treating the resulting paired dataset as a supervised regression problem, we investigate the effect of sparsity in mapping from angles to torques. The results of our investigation suggest that sparse codes can indeed represent salient features of both the kinematic and dynamic views of human locomotion movements. However, sparsity appears to be only one parameter in building a model of inverse dynamics; we also show that the "encoding" process benefits significantly by integrating with the "regression" process for this task. In addition, we show that, for this task, simple coding and decoding methods are not sufficient to model the extremely complex inverse dynamics mapping. Finally, we use our results to argue that representations of movement are critical to modeling and understanding these movements.