Deep Learning
Learning Face Hallucination in the Wild
Zhou, Erjin (Tsinghua University) | Fan, Haoqiang (Tsinghua University) | Cao, Zhimin (Megvii Technology) | Jiang, Yuning (Megvii Technology) | Yin, Qi (Megvii Technology)
Face hallucination method is proposed to generate high-resolution images from low-resolution ones for better visualization. However, conventional hallucination methods are often designed for controlled settings and cannot handle varying conditions of pose, resolution degree, and blur. In this paper, we present a new method of face hallucination, which can consistently improve the resolution of face images even with large appearance variations. Our method is based on a novel network architecture called Bi-channel Convolutional Neural Network (Bi-channel CNN). It extracts robust face representations from raw input by using deep convolutional network, then adaptively integrates two channels of information (the raw input image and face representations) to predict the high-resolution image. Experimental results show our system outperforms the prior state-of-the-art methods.
Deep Representation Learning with Target Coding
Yang, Shuo (The Chinese University of Hong Kong) | Luo, Ping (The Chinese University of Hong Kong) | Loy, Chen Change (The Chinese University of Hong Kong) | Shum, Kenneth W. (The Chinese University of Hong Kong) | Tang, Xiaoou (The Chinese University of Hong Kong)
We consider the problem of learning deep representation when target labels are available. In this paper, we show that there exists intrinsic relationship between target coding and feature representation learning in deep networks. Specifically, we found that distributed binary acode with error correcting capability is more capable of encouraging discriminative features, in comparison tothe 1-of-K coding that is typically used in supervised deep learning. This new finding reveals additional benefit of using error-correcting code for deep model learning,apart from its well-known error correcting property. Extensive experiments are conducted on popular visual benchmark datasets.
Sparse Deep Stacking Network for Image Classification
Li, Jun (Nanjing University of Science and Technology) | Chang, Heyou (Nanjing University of Science and Technology) | Yang, Jian (Nanjing University of Science and Technology)
Sparse coding can learn good robust representation to noise and model more higher-order representation for image classification. However, the inference algorithm is computationally expensive even though the supervised signals are used to learn compact and discriminative dictionaries in sparse coding techniques. Luckily, a simplified neural network module (SNNM) has been proposed to directly learn the discriminative dictionaries for avoiding the expensive inference. But the SNNM module ignores the sparse representations. Therefore, we propose a sparse SNNM module by adding the mixed-norm regularization (l1/l2 norm). The sparse SNNM modules are further stacked to build a sparse deep stacking network (S-DSN). In the experiments, we evaluate S-DSN with four databases, including Extended YaleB, AR, 15 scene and Caltech101. Experimental results show that our model outperforms related classification methods with only a linear classifier. It is worth noting that we reach 98.8% recognition accuracy on 15 scene.
Compute Less to Get More: Using ORC to Improve Sparse Filtering
Lederer, Johannes (Cornell University) | Guadarrama, Sergio (University of California at Berkeley)
Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering with spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests early stopping of Sparse Filtering. We therefore introduce the Optimal Roundness Criterion (ORC), a novel stopping criterion for Sparse Filtering. We show that this stopping criterion is related with pre-processing procedures such as Statistical Whitening and demonstrate that it can make image classification with Sparse Filtering considerably faster and more accurate.
Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web
Yang, Yezhou (University of Maryland College Park) | Li, Yi (NICTA, Australia) | Fermuller, Cornelia (University of Maryland) | Aloimonos, Yiannis (University of Maryland)
In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by ``watching'' unconstrained videos with high accuracy.
Gaussian Cardinality Restricted Boltzmann Machines
Wan, Cheng (Tsinghua University) | Jin, Xiaoming (Tsinghua University) | Ding, Guiguang (Tsinghua University) | Shen, Dou (Tsinghua University)
Restricted Boltzmann Machine (RBM) has been applied to a wide variety of tasks due to its advantage in feature extraction. Implementing sparsity constraint in the activated hidden units of RBM is an important improvement on RBM. The sparsity constraints in the existing methods are usually specified by users and are independent of the input data. However, the input data could be heterogeneous in content and thus naturally demand elastic and adaptive settings of the sparsity constraints. To solve this problem, we proposed a generalized model with adaptive sparsity constraint, named Gaussian Cardinality Restricted Boltzmann Machines (GC-RBM). In this model, the thresholds of hidden unit activations are decided by the input data and a given Gaussian distribution on the pre-training phase. We provide a principled method to train the GC-RBM with Gaussian prior. Experimental results on two real world data sets justify the effectiveness of the proposed method and its superiority over CaRBM in terms of classification accuracy.
Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation
Szerlip, Paul A. (University of Central Florida) | Morse, Gregory (University of Central Florida) | Pugh, Justin K. (University of Central Florida) | Stanley, Kenneth O. (University of Central Florida)
The increasing realization in recent years that artificial In particular, there is an alternative kind of discriminative neural networks (ANNs) can learn many layers of features learning that is unsupervised rather than supervised. In this (Bengio et al. 2007; Hinton, Osindero, and Teh 2006; proposed alternative approach, called divergent discriminative Marc'Aurelio, Boureau, and LeCun 2007; Cireşan et al. feature accumulation (DDFA), instead of searching for 2010) has reinvigorated the study of representation learning features constrained by the objective of solving the discriminative in ANNs (Bengio, Courville, and Vincent 2013). While classification problem, a learning algorithm can instead the beginning of this renaissance focused on the sequential attempt to collect as many features that discriminate unsupervised training of individual layers one upon another strongly among training examples as possible, without regard (Bengio et al. 2007; Hinton, Osindero, and Teh 2006), the to any particular classification problem.
Tensor-Variate Restricted Boltzmann Machines
Nguyen, Tu Dinh (Deakin University) | Tran, Truyen (Deakin University and Curtin University) | Phung, Dinh (Deakin University) | Venkatesh, Svetha (Deakin University)
Restricted Boltzmann Machines (RBMs) are an important class of latent variable models for representing vector data. An under-explored area is multimode data, where each data point is a matrix or a tensor. Standard RBMs applying to such data would require vectorizing matrices and tensors, thus resulting in unnecessarily high dimensionality and at the same time, destroying the inherent higher-order interaction structures. This paper introduces Tensor-variate Restricted Boltzmann Machines (TvRBMs) which generalize RBMs to capture the multiplicative interaction between data modes and the latent variables. TvRBMs are highly compact in that the number of free parameters grows only linear with the number of modes. We demonstrate the capacity of TvRBMs on three real-world applications: handwritten digit classification, face recognition and EEG-based alcoholic diagnosis. The learnt features of the model are more discriminative than the rivals, resulting in better classification performance.
Deep Modeling Complex Couplings within Financial Markets
Cao, Wei (University of Technology, Sydney) | Hu, Liang (University of Technology and Shanghai Jiaotong University) | Cao, Longbing (University of Technology)
The global financial crisis occurred in 2008 and its contagion to other regions, as well as the long-lasting impact on different markets, show that it is increasingly important to understand the complicated coupling relationships across financial markets. This is indeed very difficult as complex hidden coupling relationships exist between different financial markets in various countries, which are very hard to model. The couplings involve interactions between homogeneous markets from various countries (we call intra-market coupling), interactions between heterogeneous markets (inter-market coupling) and interactions between current and past market behaviors (temporal coupling). Very limited work has been done towards modeling such complex couplings, whereas some existing methods predict market movement by simply aggregating indicators from various markets but ignoring the inbuilt couplings. As a result, these methods are highly sensitive to observations, and may often fail when financial indicators change slightly. In this paper, a coupled deep belief network is designed to accommodate the above three types of couplings across financial markets. With a deep-architecture model to capture the high-level coupled features, the proposed approach can infer market trends. Experimental results on data of stock and currency markets from three countries show that our approach outperforms other baselines, from both technical and business perspectives.
Ordering-Sensitive and Semantic-Aware Topic Modeling
Yang, Min (The University of Hong Kong) | Cui, Tianyi (Zhejiang University) | Tu, Wenting (The University of Hong Kong)
Topic modeling of textual corpora is an important and challenging problem. In most previous work, the “bag-of-words” assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.