Statistical Learning
PAC Optimal Exploration in Continuous Space Markov Decision Processes
Pazis, Jason (Duke University) | Parr, Ronald (Duke University)
Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as epsilon-greedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of fine-tuning to produce good results. PAC optimal exploration algorithms, on the other hand, offer strong theoretical guarantees but are inapplicable in domains of realistic size. The goal of this paper is to bridge the gap between theory and practice, by introducing C-PACE, an algorithm which offers strong theoretical guarantees and can be applied to interesting, continuous space problems.
Discovering Hierarchical Structure for Sources and Entities
Pal, Aditya (IBM Research) | Dalvi, Nilesh (Facebook) | Bellare, Kedar (Facebook)
In this paper, we consider the problem of jointly learning hierarchies over a set of sources and entities based on their containment relationship. We model the concept of hierarchy using a set of latent binary features and propose a generative model that assigns those latent features to sources and entities in order to maximize the probability of the observed containment. To avoid fixing the number of features beforehand, we consider a non-parametric approach based on the Indian Buffet Process. The hierarchies produced by our algorithm can be used for completing missing associations and discovering structural bindings in the data. Using simulated and real datasets we provide empirical evidence of the effectiveness of the proposed approach in comparison to the existing hierarchy agnostic approaches.
Basis Adaptation for Sparse Nonlinear Reinforcement Learning
Mahadevan, Sridhar (University of Massachusetts, Amherst) | Giguere, Stephen (University of Massachusetts, Amherst) | Jacek, Nicholas (University of Massachusetts, Amherst)
This paper presents a new approach to representation discovery in reinforcement learning (RL) using basis adaptation. We introduce a general framework for basis adaptation as {\em nonlinear separable least-squares value function approximation} based on finding Frechet gradients of an error function using variable projection functionals. We then present a scalable proximal gradient-based approach for basis adaptation using the recently proposed mirror-descent framework for RL. Unlike traditional temporal-difference (TD) methods for RL, mirror descent based RL methods undertake proximal gradient updates of weights in a dual space, which is linked together with the primal space using a Legendre transform involving the gradient of a strongly convex function. Mirror descent RL can be viewed as a proximal TD algorithm using Bregman divergence as the distance generating function. We present a new class of regularized proximal-gradient based TD methods, which combine feature selection through sparse L1 regularization and basis adaptation. Experimental results are provided to illustrate and validate the approach.
Vector-Valued Multi-View Semi-Supervsed Learning for Multi-Label Image Classification
Luo, Yong (Peking University) | Tao, Dacheng (University of Technology, Sydney) | Xu, Chang (Peking University) | Li, Dongchen (Peking University) | Xu, Chao (Peking University)
Images are usually associated with multiple labels and comprised of multiple views, due to each image containing several objects (e.g. a pedestrian, bicycle and tree) and multiple visual features (e.g. color, texture and shape). Currently available tools tend to use either labels or features for classification, but both are necessary to describe the image properly. There have been recent successes in using vector-valued functions, which construct matrix-valued kernels, to explore the multi-label structure in the output space. This has motivated us to develop multi-view vector-valued manifold regularization (MV$^3$MR) in order to integrate multiple features. MV$^3$MR exploits the complementary properties of different features, and discovers the intrinsic local geometry of the compact support shared by different features, under the theme of manifold regularization. We validate the effectiveness of the proposed MV$^3$MR methodology for image classification by conducting extensive experiments on two challenge datasets, PASCAL VOC' 07 and MIR Flickr.
Large-Scale Hierarchical Classification via Stochastic Perceptron
Liu, Dehua (Zhejiang University) | Tu, Bojun (Zhejiang University) | Qian, Hui (Zhejiang University) | Zhang, Zhihua (Zhejiang University)
Hierarchical classification (HC) plays an significant role in machine learning and data mining. However, most of the state-of-the-art HC algorithms suffer from high computational costs. To improve the performance of solving, we propose a stochastic perceptron (SP) algorithm in the large margin framework. In particular, a stochastic choice procedure is devised to decide the direction of next iteration. We prove that after finite iterations the SP algorithm yields a sub-optimal solution with high probability if the input instances are separable. For large-scale and high-dimensional data sets, we reform SP to the kernel version (KSP), which dramatically reduces the memory space needed. The KSP algorithm has the merit of low space complexity as well as low time complexity. The experimental results show that our KSP approach achieves almost the same accuracy as the contemporary algorithms on the real-world data sets, but with much less CPU running time.
Walking on Minimax Paths for k-NN Search
Kim, Kye-Hyeon (Pohang University of Science and Technology) | Choi, Seungjin (Pohang University of Science and Technology)
Link-based dissimilarity measures, such as shortest path or Euclidean commute time distance, base their distance on paths between nodes of a weighted graph. These measures are known to be better suited to data manifold with nonconvex-shaped clusters, compared to Euclidean distance, so that k -nearest neighbor (NN) search is improved in such metric spaces. In this paper we present a new link-based dissimilarity measure based on minimax paths between nodes. Two main benefits of minimax path-based dissimilarity measure are: (1) only a subset of paths is considered to make it scalable, while Euclidean commute time distance considers all possible paths; (2) it better captures nonconvex-shaped cluster structure, compared to shortest path distance. We define the total cost assigned to a path between nodes as L p norm of intermediate costs of edges involving the path, showing that minimax path emerges from our L p norm over paths framework. We also define minimax distance as the intermediate cost of the longest edge on the minimax path, then present a greedy algorithm to compute k smallest minimax distances between a query and N data points in O(log N + k log k) time. Numerical experiments demonstrate that our minimax k-NN algorithm reduce the search time by several orders of magnitude, compared to existing methods, while the quality of k -NN search is significantly improved over Euclidean distance.
Unsupervised Cluster Matching via Probabilistic Latent Variable Models
Iwata, Tomoharu (NTT) | Hirao, Tsutomu (NTT) | Ueda, Naonori (NTT)
We propose a probabilistic latent variable model for unsupervised cluster matching, which is the task of finding correspondences between clusters of objects in different domains. Existing object matching methods find one-to-one matching. The proposed model finds many-to-many matching, and can handle multiple domains with different numbers of objects. The proposed model assumes that there are an infinite number of latent vectors that are shared by all domains, and that each object is generated using one of the latent vectors and a domain-specific linear projection. By inferring a latent vector to be used for generating each object, objects in different domains are clustered in shared groups, and thus we can find matching between clusters in an unsupervised manner. We present efficient inference procedures for the proposed model based on a stochastic EM algorithm. The effectiveness of the proposed model is demonstrated with experiments using synthetic and real data sets.
Supervised and Projected Sparse Coding for Image Classification
Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington)
Classic sparse representation for classification (SRC) method fails to incorporate the label information of training images, and meanwhile has a poor scalability due to the expensive computation for l_1 norm. In this paper, we propose a novel subspace sparse coding method with utilizing label information to effectively classify the images in the subspace. Our new approach unifies the tasks of dimension reduction and supervised sparse vector learning, by simultaneously preserving the data sparse structure and meanwhile seeking the optimal projection direction in the training stage, therefore accelerates the classification process in the test stage. Our method achieves both flat and structured sparsity for the vector representations, therefore making our framework more discriminative during the subspace learning and subsequent classification. The empirical results on 4 benchmark data sets demonstrate the effectiveness of our method.
Spectral Rotation versus K-Means in Spectral Clustering
Huang, Jin (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)
Spectral clustering has been a popular data clustering algorithm. This category of approaches often resort to other clustering methods, such as K-Means, to get the final cluster. The potential flaw of such common practice is that the obtained relaxed continuous spectral solution could severely deviate from the true discrete solution. In this paper, we propose to impose an additional orthonormal constraint to better approximate the optimal continuous solution to the graph cut objective functions. Such a method, called spectral rotation in literature, optimizes the spectral clustering objective functions better than K -Means, and improves the clustering accuracy. We would provide efficient algorithm to solve the new problem rigorously, which is not significantly more costly than K-Means. We also establish the connection between our method andK-Means to provide theoretical motivation of our method. Experimental results show that our algorithm consistently reaches better cut and meanwhile outperforms in clustering metrics than classic spectral clustering methods.
Convex Subspace Representation Learning from Multi-View Data
Guo, Yuhong (Temple University)
Learning from multi-view data is important in many applications. In this paper, we propose a novel convex subspace representation learning method for unsupervised multi-view clustering. We first formulate the subspace learning with multiple views as a joint optimization problem with a common subspace representation matrix and a group sparsity inducing norm. By exploiting the properties of dual norms, we then show a convex min-max dual formulation with a sparsity inducing trace norm can be obtained. We develop a proximal bundle optimization algorithm to globally solve the min-max optimization problem. Our empirical study shows the proposed subspace representation learning method can effectively facilitate multi-view clustering and induce superior clustering results than alternative multi-view clustering methods.