Asia
Image Feature Learning for Cold Start Problem in Display Advertising
Mo, Kaixiang (Hong Kong University of Science and Technology) | Liu, Bo (Hong Kong University of Science and Technology) | Xiao, Lei (Tencent Inc., Shenzhen) | Li, Yong (Tencent Inc., Shenzhen) | Jiang, Jie (Tencent Inc., Shenzhen)
In online display advertising, state-of-the-art Click Through Rate(CTR) prediction algorithms rely heavily on historical information, and they work poorly on growing number of new ads without any historical information. This is known as the the cold start problem. For image ads, current state-of-the-art systems use handcrafted image features such as multimedia features and SIFT features to capture the attractiveness of ads. However, these handcrafted features are task dependent, inflexible and heuristic. In order to tackle the cold start problem in image display ads, we propose a new feature learning architecture to learn the most discriminative image features directly from raw pixels and user feedback in the target task. The proposed method is flexible and does not depend on human heuristic. Extensive experiments on a real world dataset with 47 billion records show that our feature learning method outperforms existing handcrafted features significantly, and it can extract discriminative and meaningful features.
EntScene: Nonparametric Bayesian Temporal Segmentation of Videos Aimed at Entity-Driven Scene Detection
Mitra, Adway (Indian Institute of Science) | Bhattacharyya, Chiranjib (Indian Institute of Science) | Biswas, Soma (Indian Institute of Science)
In this paper, we study Bayesian techniques for entity discovery and temporal segmentation of videos. Existing temporal video segmentation techniques are based on low-level features, and are usually suitable for discovering short, homogeneous shots rather than diverse scenes, each of which contains several such shots. We define scenes in terms of semantic entities (eg. persons). This is the first attempt at entity-driven scene discovery in videos, without using meta-data like scripts. The problem is hard because we have no explicit prior information about the entities and the scenes. However such sequential data exhibit temporal coherence in multiple ways, and this provides implicit cues. To capture these, we propose a Bayesian generative model- EntScene, that represents entities with mixture components and scenes with discrete distributions over these components. The most challenging part of this approach is the inference, as it involves complex interactions of latent variables. To this end, we propose an algorithm based on Dynamic Blocked Gibbs Sampling, that attempts to jointly learn the components and the segmentation, by progressively merging an initial set of short segments. The proposed algorithm compares favourably against suitably designed baselines on several TV-series videos. We extend the method to an unexplored problem: temporal co-segmentation of videos containing same entities.
Optimizing Locally Linear Classifiers with Supervised Anchor Point Learning
Mao, Xue (Chinese Academy of Sciences) | Fu, Zhouyu (University of Western Sydney) | Wu, Ou (Chinese Academy of Sciences) | Hu, Weiming (Chinese Academy of Sciences)
Kernel SVM suffers from high computational complexity when dealing with large-scale nonlinear datasets. To address this issue, locally linear classifiers have been proposed for approximating nonlinear decision boundaries with locally linear functions using a local coding scheme. The effectiveness of such coding scheme depends heavily on the quality of anchor points chosen to produce the local codes. Existing methods usually involve a phase of unsupervised anchor point learning followed by supervised classifier learning. Thus, the anchor points and classifiers are obtained separately whereas the learned anchor points may not be optimal for the discriminative task. In this paper, we present a novel fully supervised approach for anchor point learning. A single optimization problem is formulated over both anchor point and classifier variables, optimizing the initial anchor points jointly with the classifiers to minimize the classification risk. Experimental results show that our method outperforms other competitive methods which employ unsupervised anchor point learning and achieves performance on par with the kernel SVM albeit with much improved efficiency.
Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences
Luo, Dixin (Shanghai Jiao Tong University) | Xu, Hongteng (Georgia Institute of Technology) | Zhen, Yi (Georgia Institute of Technology) | Ning, Xia (Indiana University-Purdue University Indianapolis) | Zha, Hongyuan (Georgia Institute of Technology) | Yang, Xiaokang (Shanghai Jiao Tong University) | Zhang, Wenjun (Shanghai Jiao Tong University)
We propose a Multi-task Multi-dimensional Hawkes Process (MMHP) for modeling event sequences where there exist multiple triggering patterns within sequences and structures across sequences.MMHP is able to model the dynamics of multiple sequences jointly by imposing structural constraints and thus systematically uncover clustering structure among sequences.We propose an effective and robust optimization algorithm to learn MMHP models, which takes advantage of alternating direction method of multipliers (ADMM), majorization minimization and Euler-Lagrange equations.Our experimental results demonstrate that MMHP performs well on both synthetic and real data
Robust Kernel Dictionary Learning Using a Whole Sequence Convergent Algorithm
Liu, Huaping (Tsinghua University) | Qin, Jie (Tsinghua University) | Cheng, Hong (University of Electronic Science and Technology of China) | Sun, Fuchun (Tsinghua University)
Kernel sparse coding is an effective strategy to capturethe non-linear structure of data samples. However,how to learn a robust kernel dictionary remainsan open problem. In this paper, we propose a new optimization model to learn the robust kernel dictionary while isolating outliers in the training samples. This model is essentially based on the decomposition of the reconstruction error into small dense noises and large sparse outliers. The outliererror term is formulated as the product of the sample matrix in the feature space and a diagonal coefficient matrix. This facilitates the kernelized dictionary learning. To solve the non-convex optimization problem, we develop a whole sequence convergent algorithm which guarantees the obtained solution sequence is a Cauchy sequence. The experimental results show that the proposed robust kernel dictionary learning method provides significant performance improvement.
Regularizing Flat Latent Variables with Hierarchical Structures
Lin, Rongcheng (University of North Carolina at Charlotte) | Li, Huayu (University of North Carolina at Charlotte) | Quan, Xiaojun (Institute for Infocomm Research) | Hong, Richang (Hefei University of Technology) | Wu, Zhiang (Nanjing University of Finance and Economics) | Ge, Yong (University of North Carolina at Charlotte)
In this paper, we propose a stratified topic model (STM). Instead of directly modeling and inferring flat topics or hierarchically structured topics, we use the stratified relationships in topic hierarchies to regularize the flat topics. The topic structures are captured by a hierarchical clustering method and play as constraints during the learning process. We propose two theoretically sound and practical inference methods to solve the model. Experimental results with two real world data sets and various evaluation metrics demonstrate the effectiveness of the proposed model.
Mixed Error Coding for Face Recognition with Mixed Occlusions
Liang, Ronghua (Zhejiang University of Technology) | Li, Xiao-Xin (Zhejiang University of Technology)
Mixed occlusions commonly consist in real-world face images and bring with it great challenges for automatic face recognition. The existing methods usually utilize the same reconstruction error to code the occluded test image with respect to the labeled training set and simultaneously to estimate the occlusion/feature support. However, this error coding model might not be applicable for face recognition with mixed occlusions. For mixed occlusions, the error used to code the test image, called the discriminative error, and the error used to estimate the occlusion support, called the structural error, might have totally different behaviors. By combining the two various errors with the occlusion support, we present an extended error coding model, dubbed Mixed Error Coding (MEC). To further enhance discriminability and feature selection ability, we also incorporate into MEC the hidden feature selection technology of the subspace learning methods in the domain of the image gradient orientations. Experiments demonstrate the effectiveness and robustness of the proposed MEC model in dealing with mixed occlusions.
Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
Li, Yitan (University of Science and Technology of China) | Xu, Linli (University of Science and Technology of China) | Tian, Fei (University of Science and Technology of China) | Jiang, Liang (University of Science and Technology of China) | Zhong, Xiaowei (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China)
Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except for a recent work that explains SGNS as an implicit matrix factorization of the pointwise mutual information (PMI) matrix. In this paper, we provide a new perspective for further understanding SGNS. We point out that SGNS is essentially a representation learning method, which learns to represent the co-occurrence vector for a word. Based on the representation learning view, SGNS is in fact an explicit matrix factorization (EMF) of the wordsโ co-occurrence matrix. Furthermore, extended supervised word embedding can be established based on our proposed representation learning view.
Multi-Task Model and Feature Joint Learning
Li, Ya (University of Science and Technology of China) | Tian, Xinmei (University of Science and Technology of China) | Liu, Tongliang (University of Technology, Sydney) | Tao, Dacheng (University of Technology, Sydney)
Given several tasks, multi-task learning (MTL) learns multiple tasks jointly by exploring the interdependence between them. The basic assumption in MTL is that those tasks are indeed related. Existing MTL methods model the task relatedness/interdependence in two different ways, either common parameter-sharing or common feature-sharing across tasks. In this paper, we propose a novel multi-task learning method to jointly learn shared parameters and shared feature representation. Our objective is to learn a set of common features with which the tasks are related as closely as possible, therefore common parameters shared across tasks can be optimally learned. We present a detailed deviation of our multi-task learning method and propose an alternating algorithm to solve the non-convex optimization problem. We further present a theoretical bound which directly demonstrates that the proposed multi-task learning method can successfully model the relatedness via joint common parameter- and common feature-learning. Extensive experiments are conducted on several real world multi-task learning datasets. All results demonstrate the effectiveness of our multi-task model and feature joint learning method.
Multi-Label Classification with Feature-Aware Non-Linear Label Space Transformation
Li, Xin (Temple University) | Guo, Yuhong (Temple University)
Multi-label classification with many classes has recently drawn a lot of attention. Existing methods address this problem by performing linear label space transformation to reduce the dimension of label space, and then conducting independent regression for each reduced label dimension. These methods however do not capture nonlinear correlations of the multiple labels and may lead to significant information loss in the process of label space reduction. In this paper, we first propose to exploit kernel canonical correlation analysis (KCCA) to capture nonlinear label correlation information and perform nonlinear label space reduction. Then we develop a novel label space reduction method that explicitly combines linear and nonlinear label space transformations based on CCA and KCCA respectively to address multi-label classification with many classes. The proposed method is a feature-aware label transformation method that promotes the label predictability in the transformed label space from the input features. We conduct experiments on a number of multi-label classification datasets. The proposed approach demonstrates good performance, comparing to a number of state-of-the-art label dimension reduction methods.