Statistical Learning
What Is an Opinion About? Exploring Political Standpoints Using Opinion Scoring Model
Chen, Bi (Pennsylvania State University) | Zhu, Leilei (Pennsylvania State University) | Kifer, Daniel (Pennsylvania State University) | Lee, Dongwon (Pennsylvania State University)
In this paper, we propose a generative model to automatically discover the hidden associations between topics words and opinion words. By applying those discovered hidden associations, we construct the opinion scoring models to extract statements which best express opinionists’ standpoints on certain topics. For experiments, we apply our model to the political area. First, we visualize the similarities and dissimilarities between Republican and Democratic senators with respect to various topics. Second, we compare the performance of the opinion scoring models with 14 kinds of methods to find the best ones. We find that sentences extracted by our opinion scoring models can effectively express opinionists’ standpoints.
Stability and Incentive Compatibility in a Kernel-Based Combinatorial Auction
Lahaie, Sebastien (Yahoo Research)
We present the design and analysis of an approximately incentive-compatible combinatorial auction. In just a single run, the auction is able to extract enough value information from bidders to compute approximate truth-inducing payments. This stands in contrast to current auction designs that need to repeat the allocation computation as many times as there are bidders to achieve incentive compatibility. The auction is formulated as a kernel method, which allows for flexibility in choosing the price structure via a kernel function. Our main result characterizes the extent to which our auction is incentive-compatible in terms of the complexity of the chosen kernel function. Our analysis of the auction's properties is based on novel insights connecting the notion of stability in statistical learning theory to that of universal competitive equilibrium in the auction literature.
Gaussian Process Latent Random Field
Zhong, Guoqiang (Chinese Academy of Sciences) | Li, Wu-Jun (The Hong Kong University of Science and Technology) | Yeung, Dit-Yan (The Hong Kong University of Science and Technology) | Hou, Xinwen (Chinese Academy of Sciences) | Liu, Cheng-Lin (Chinese Academy of Sciences)
Transductive Learning on Adaptive Graphs
Zhang, Yan-Ming (Chinese Academy of Sciences) | Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology) | Liu, Cheng-Lin (Chinese Academy of Sciences) | Hou, Xinwen (Chinese Academy of Sciences)
Graph-based semi-supervised learning methods are based on some smoothness assumption about the data. As a discrete approximation of the data manifold, the graph plays a crucial role in the success of such graph-based methods. In most existing methods, graph construction makes use of a predefined weighting function without utilizing label information even when it is available. In this work, by incorporating label information, we seek to enhance the performance of graph-based semi-supervised learning by learning the graph and label inference simultaneously. In particular, we consider a particular setting of semi-supervised learning called transductive learning. Using the LogDet divergence to define the objective function, we propose an iterative algorithm to solve the optimization problem which has closed-form solution in each step. We perform experiments on both synthetic and real data to demonstrate improvement in the graph and in terms of classification accuracy.
Multitask Bregman Clustering
Zhang, Jianwen (Tsinghua University) | Zhang, Changshui (Tsinghua University)
Traditional clustering methods deal with a single clustering task on a single data set. However, in some newly emerging applications, multiple similar clustering tasks are involved simultaneously. In this case, we not only desire a partition for each task, but also want to discover the relationship among clusters of different tasks. It's also expected that the learnt relationship among tasks can improve performance of each single task. In this paper, we propose a general framework for this problem and further suggest a specific approach. In our approach, we alternatively update clusters and learn relationship between clusters of different tasks, and the two phases boost each other. Our approach is based on the general Bregman divergence, hence it's suitable for a large family of assumptions on data distributions and divergences. Empirical results on several benchmark data sets validate the approach.
Local and Global Regressive Mapping for Manifold Learning with Out-of-Sample Extrapolation
Yang, Yi (Zhejiang University) | Nie, Feiping (University of Texas, Arlington) | Xiang, Shiming (Chinese Academy of Sciences) | Zhuang, Yueting (Zhejiang University) | Wang, Wenhua (Zhejiang University)
Over the past few years, a large family of manifold learning algorithms have been proposed, and applied to various applications. While designing new manifold learning algorithms has attracted much research attention, fewer research efforts have been focused on out-of-sample extrapolation of learned manifold. In this paper, we propose a novel algorithm of manifold learning. The proposed algorithm, namely Local and Global Regressive Mapping (LGRM), employs local regression models to grasp the manifold structure. We additionally impose a global regression term as regularization to learn a model for out-of-sample data extrapolation. Based on the algorithm, we propose a new manifold learning framework. Our framework can be applied to any manifold learning algorithms to simultaneously learn the low dimensional embedding of the training data and a model which provides explicit mapping of the out-of-sample data to the learned manifold. Experiments demonstrate that the proposed framework uncover the manifold structure precisely and can be freely applied to unseen data.
Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise
Yamada, Makoto (Tokyo Institute of Technology) | Sugiyama, Masashi (Tokyo Institute of Technology)
The discovery of non-linear causal relationship under additive non-Gaussian noise models has attracted considerable attention recently because of their high flexibility. In this paper, we propose a novel causal inference algorithm called least-squares independence regression (LSIR). LSIR learns the additive noise model through minimization of an estimator of the squared-loss mutual information between inputs and residuals. A notable advantage of LSIR over existing approaches is that tuning parameters such as the kernel width and the regularization parameter can be naturally optimized by cross-validation, allowing us to avoid overfitting in a data-dependent fashion. Through experiments with real-world datasets, we show that LSIR compares favorably with the state-of-the-art causal inference method.
Discovering Long Range Properties of Social Networks with Multi-Valued Time-Inhomogeneous Models
Wyatt, Danny (University of Washington) | Choudhury, Tanzeem (Dartmouth College) | Bilmes, Jeff (University of Washington)
The current methods used to mine and analyze temporal social network data make two assumptions: all edges have the same strength, and all parameters are time-homogeneous. We show that those assumptions may not hold for social networks and propose an alternative model with two novel aspects: (1) the modeling of edges as multi-valued variables that can change in intensity, and (2) the use of a curved exponential family framework to capture time-inhomogeneous properties while retaining a parsimonious and interpretable model. We show that our model outperforms traditional models on two real-world social network data sets.
Discriminant Laplacian Embedding
Wang, Hua (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington)
Many real life applications brought by modern technologies often have multiple data sources, which are usually characterized by both attributes and pairwise similarities at the same time. For example in webpage ranking, a webpage is usually represented by a vector of term values, and meanwhile the internet linkages induce pairwise similarities among the webpages. Although both attributes and pairwise similarities are useful for class membership inference, many traditional embedding algorithms only deal with one type of input data. In order to make use of the both types of data simultaneously, in this work, we propose a novel Discriminant Laplacian Embedding (DLE) approach. Supervision information from training data are integrated into DLE to improve the discriminativity of the resulted embedding space. By solving the ambiguity problem in computing the scatter matrices caused by data points with multiple labels, we successfully extend the proposed DLE to multi-label classification. In addition, through incorporating the label correlations, the classification performance using multi-label DLE is further enhanced. Promising experimental results in extensive empirical evaluations have demonstrated the effectiveness of our approaches.
Multi-Label Learning with Weak Label
Sun, Yu-Yin (Nanjing University) | Zhang, Yin (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)
Multi-label learning deals with data associated with multiple labels simultaneously. Previous work on multi-label learning assumes that for each instance, the “full” label set associated with each training instance is given by users. In many applications, however, to get the full label set for each instance is difficult and only a “partial” set of labels is available. In such cases, the appearance of a label means that the instance is associated with this label, while the absence of a label does not imply that this label is not proper for the instance. We call this kind of problem “weak label” problem. In this paper, we propose the WELL (WEak Label Learning) method to solve the weak label problem. We consider that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples. The objective is formulated as a convex optimization problem which can be solved efficiently. Moreover, we exploit the correlation between labels by assuming that there is a group of low-rank base similarities, and the appropriate similarities between instances for different labels can be derived from these base similarities. Experiments validate the performance of WELL.