Goto

Collaborating Authors

 Yang, Yang


Collaborative Filtering With Social Exposure: A Modular Approach to Social Recommendation

AAAI Conferences

This paper is concerned with how to make efficient use of social information to improve recommendations. Most existing social recommender systems assume people share similar preferences with their social friends. Which, however, may not hold true due to various motivations of making online friends and dynamics of online social networks. Inspired by recent causal process based recommendations that first model user exposures towards items and then use these exposures to guide rating prediction, we utilize social information to capture user exposures rather than user preferences. We assume that people get information of products from their online friends and they do not have to share similar preferences, which is less restrictive and seems closer to reality. Under this new assumption, in this paper, we present a novel recommendation approach (named SERec) to integrate social exposure into collaborative filtering. We propose two methods to implement SERec, namely social regularization and social boosting, each with different ways to construct social exposures. Experiments on four real-world datasets demonstrate that our methods outperform the state-of-the-art methods on top-N recommendations. Further study compares the robustness and scalability of the two proposed methods.


Deep Learning Scaling is Predictable, Empirically

arXiv.org Machine Learning

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.


Representation Learning for Scale-free Networks

arXiv.org Machine Learning

Network embedding aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the network is preserved. Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored. Scale-free property depicts the fact that vertex degrees follow a heavy-tailed distribution (i.e., only a few vertexes have high degrees) and is a critical property of real-world networks, such as social networks. In this paper, we study the problem of learning representations for scale-free networks. We first theoretically analyze the difficulty of embedding and reconstructing a scale-free network in the Euclidean space, by converting our problem to the sphere packing problem. Then, we propose the "degree penalty" principle for designing scale-free property preserving network embedding algorithm: punishing the proximity between high-degree vertexes. We introduce two implementations of our principle by utilizing the spectral techniques and a skip-gram model respectively. Extensive experiments on six datasets show that our algorithms are able to not only reconstruct heavy-tailed distributed degree distribution, but also outperform state-of-the-art embedding models in various network mining tasks, such as vertex classification and link prediction.


A Parallel Best-Response Algorithm with Exact Line Search for Nonconvex Sparsity-Regularized Rank Minimization

arXiv.org Machine Learning

In this paper, we propose a convergent parallel best-response algorithm with the exact line search for the nondifferentiable nonconvex sparsity-regularized rank minimization problem. On the one hand, it exhibits a faster convergence than subgradient algorithms and block coordinate descent algorithms. On the other hand, its convergence to a stationary point is guaranteed, while ADMM algorithms only converge for convex problems. Furthermore, the exact line search procedure in the proposed algorithm is performed efficiently in closed-form to avoid the meticulous choice of stepsizes, which is however a common bottleneck in subgradient algorithms and successive convex approximation algorithms. Finally, the proposed algorithm is numerically tested.


Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification

arXiv.org Machine Learning

Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other's representation. Specifically, the spatial pooling layer is able to select regions from each frame, while the attention temporal pooling performed can select informative frames over the sequence, both pooling guided by the information from distance matching. Experiments are conduced on the iLIDS-VID, PRID-2011 and MARS datasets and the results demonstrate that this approach outperforms existing state-of-art methods. We also analyze how the joint pooling in both dimensions can boost the person re-id performance more effectively than using either of them separately.


Unsupervised Learning of Multi-Level Descriptors for Person Re-Identification

AAAI Conferences

In this paper, we propose a novel coding method named weighted linear coding (WLC) to learn multi-level (e.g., pixel-level, patch-level and image-level) descriptors from raw pixel data in an unsupervised manner. It guarantees the property of saliency with a similarity constraint. The resulting multi-level descriptors have a good balance between the robustness and distinctiveness. Based on WLC, all data from the same region can be jointly encoded. Consequently, when we extract the holistic image features, it is able to preserve the spatial consistency. Furthermore, we apply PCA to these features and compact person representations are then achieved. During the stage of matching persons, we exploit the complementary information resided in multi-level descriptors via a score-level fusion strategy. Experiments on the challenging person re-identification datasets - VIPeR and CUHK 01, demonstrate the effectiveness of our method.


One-Step Spectral Clustering via Dynamically Learning Affinity Matrix and Subspace

AAAI Conferences

This paper proposes a one-step spectral clustering method by learning an intrinsic affinity matrix (i.e., the clustering result) from the low-dimensional space (i.e., intrinsic subspace) of original data. Specifically, the intrinsic affinitymatrix is learnt by: 1) the alignment of the initial affinity matrix learnt from original data; 2) the adjustment of the transformation matrix, which transfers the original feature space into its intrinsic subspace by simultaneously conducting feature selection and subspace learning; and 3) the clustering result constraint, i.e., the graph constructed by the intrinsic affinity matrix has exact c connected components where c is the number of clusters. In this way, two affinity matrices and a transformation matrix are iteratively updated until achieving their individual optimum, so that these two affinity matrices are consistent and the intrinsic subspace is learnt via the transformation matrix. Experimental results on both synthetic and benchmark datasets verified that our proposed method outputted more effective clustering result than the previous clustering methods.


Deep Learning for Fixed Model Reuse

AAAI Conferences

Model reuse attempts to construct a model by utilizing existing available models, mostly trained for other tasks, rather than building a model from scratch. It is helpful to reduce the time cost, data amount, and expertise required. Deep learning has achieved great success in various tasks involving images, voices and videos. There are several studies have the sense of model reuse, by trying to reuse pre-trained deep networks architectures or deep model features to train a new deep model. They, however, neglect the fact that there are many other fixed models or features available. In this paper, we propose a more thorough model reuse scheme, FMR (Fixed Model Reuse). FMR utilizes the learning power of deep models to implicitly grab the useful discriminative information from fixed model/features that have been widely used in general tasks. We firstly arrange the convolution layers of a deep network and the provided fixed model/features in parallel, fully connecting to the output layer nodes. Then, the dependencies between the output layer nodes and the fixed model/features are knockdown such that only the raw feature inputs are needed when the model is being used for testing, though the helpful information in the fixed model/features have already been incorporated into the model. On one hand, by the FMR scheme, the required amount of training data can be significantly reduced because of the reuse of fixed model/features. On the other hand, the fixed model/features are not explicitly used in testing, and thus, the scheme can be quite useful in applications where the fixed model/features are protected by patents or commercial secrets. Experiments on five real-world datasets validate the effectiveness of FMR compared with state-of-the-art deep methods.


Recurrent Attentional Topic Model

AAAI Conferences

In a document, the topic distribution of a sentence depends on both the topics of preceding sentences and its own content, and it is usually affected by the topics of the preceding sentences with different weights. It is natural that a document can be treated as a sequence of sentences. Most existing works for Bayesian document modeling do not take these points into consideration. To fill this gap, we propose a Recurrent Attentional Topic Model (RATM) for document embedding. The RATM not only takes advantage of the sequential orders among sentence but also use the attention mechanism to model the relations among successive sentences. In RATM, we propose a Recurrent Attentional Bayesian Process (RABP) to handle the sequences. Based on the RABP, RATM fully utilizes the sequential information of the sentences in a document. Experiments on two copora show that our model outperforms state-of-the-art methods on document modeling and classification.


Metric Embedded Discriminative Vocabulary Learning for High-Level Person Representation

AAAI Conferences

A variety of encoding methods for bag of word (BoW) model have been proposed to encode the local features in image classification. However, most of them are unsupervised and just employ k-means to form the visual vocabulary, thus reducing the discriminative power of the features. In this paper, we propose a metric embedded discriminative vocabulary learning for high-level person representation with application to person re-identification. A new and effective term is introduced which aims at making the same persons closer while different ones farther in the metric space. With the learned vocabulary, we utilize a linear coding method to encode the image-level features (or holistic image features) for extracting high-level person representation. Different from traditional unsupervised approaches, our method can explore the relationship(same or not) among the persons. Since there is an analytic solution to the linear coding, it is easy to obtain the final high-level features. The experimental results on person re-identification demonstrate the effectiveness of our proposed algorithm.