Country
Cluster Indicator Decomposition for Efficient Matrix Factorization
Luo, Dijun (The University of Texas at Arlington) | Ding, Chris (The University of Texas at Arlington) | Huang, Heng (The University of Texas at Arlington)
We propose a new clustering based low-rank matrix approximation method, Cluster Indicator Decomposition (CID), which yields more accurate low-rank approximations than previous commonly used singular value decomposition and other Nyström style decompositions. Our model utilizes the intrinsic structures of data and theoretically be more compact and accurate than the traditional low rank approximation approaches. The reconstruction in CID is extremely fast leading to a desirable advantage of our method in large-scale kernel machines (like Support Vector Machines) in which the reconstruction of the kernels needs to be frequently computed. Experimental results indicate that our approach compress images much more efficiently than other factorization based methods. We show that combining our method with Support Vector Machines obtains more accurate approximation and more accurate prediction while consuming much less computation resources.
Locality-Constrained Concept Factorization
Liu, Haifeng (Zhejiang University) | Yang, Zheng (Zhejiang University) | Wu, Zhaohui (Zhejiang University)
Matrix factorization based techniques, such as nonnegative matrix factorization (NMF) and concept factorization (CF), have attracted great attention in dimension reduction and data clustering. Both of them are linear learning problems and lead to a sparse representation of the data. However, the sparsity obtained by these methods does not always satisfy locality conditions, thus the obtained data representation is not the best. This paper introduces a locality-constrained concept factorization method which imposes a locality constraint onto the traditional concept factorization. By requiring the concepts (basis vectors) to be as close to the original data points as possible, each data can be represented by a linear combination of only a few basis concepts. Thus our method is able to achieve sparsity and locality at the same time. We demonstrate the effectiveness of this novel algorithm through a set of evaluations on real world applications.
Probit Classifiers with a Generalized Gaussian Scale Mixture Prior
Liu, Guoqing (Nanyang Technological University) | Wu, Jianxin (Nanyang Technological University) | Zhou, Suiping (Teesside University)
Most of the existing probit classifiers are based on sparsity-oriented modeling. However, we show that sparsity is not always desirable in practice, and only an appropriate degree of sparsity is profitable. In this work, we propose a flexible probabilistic model using a generalized Gaussian scale mixture prior that can promote an appropriate degree of sparsity for its model parameters, and yield either sparse or non-sparse estimates according to the intrinsic sparsity of features in a dataset. Model learning is carried out by an efficient modified maximum a posteriori (MAP) estimate. We also show relationships of the proposed model to existing probit classifiers as well as iteratively re-weighted l1 and l2 minimizations. Experiments demonstrate that the proposed method has better or comparable performances in feature selection for linear classifiers as well as in kernel-based classification.
Modular Community Detection in Networks
Li, Wenye (Macao Polytechnic Institute) | Schuurmans, Dale (University of Alberta)
Network community detection — the problem of dividing a network of interest into clusters for intelligent analysis — has recently attracted significant attention in diverse fields of research. To discover intrinsic community structure a quantitative measure called modularity has been widely adopted as an optimization objective. Unfortunately, modularity is inherently NP-hard to optimize and approximate solutions must be sought if tractability is to be ensured. In practice, a spectral relaxation method is most often adopted, after which a community partition is recovered from relaxed fractional values by a rounding process. In this paper, we propose an iterative rounding strategy for identifying the partition decisions that is coupled with a fast constrained power method that sequentially achieves tighter spectral relaxations. Extensive evaluation with this coupled relaxation-rounding method demonstrates consistent and sometimes dramatic improvements in the modularity of the communities discovered.
Learning Hash Functions for Cross-View Similarity Search
Kumar, Shaishav (Microsoft Research India) | Udupa, Raghavendra (Microsoft Research India)
Many applications in Multilingual and Multimodal Information Access involve searching large databases of high dimensional data objects with multiple (conditionally independent) views. In this work we consider the problem of learning hash functions for similarity search across the views for such applications. We propose a principled method for learning a hash function for each view given a set of multiview training data objects. The hash functions map similar objects to similar codes across the views thus enabling cross-view similarity search. We present results from an extensive empirical study of the proposed approach which demonstrate its effectiveness on Japanese language People Search and Multilingual People Search problems.
Incremental Slow Feature Analysis
Kompella, Varun Raj (IDSIA, Lugano) | Luciw, Matthew (IDSIA, Lugano) | Schmidhuber, Juergen (IDSIA, Lugano)
The Slow Feature Analysis (SFA) unsupervised learning framework extracts features representing the underlying causes of the changes within a temporally coherent high-dimensional raw sensory input signal. We develop the first online version of SFA, via a combination of incremental Principal Components Analysis and Minor Components Analysis. Unlike standard batch-based SFA, online SFA adapts along with non-stationary environments, which makes it a generally useful unsupervised preprocessor for autonomous learning agents. We compare online SFA to batch SFA in several experiments and show that it indeed learns without a teacher to encode the input stream by informative slow features representing meaningful abstract environmental properties. We extend online SFA to deep networks in hierarchical fashion, and use them to successfully extract abstract object position information from high-dimensional video.
Activity Recognition with Finite State Machines
Kerr, Wesley (University of Arizona) | Tran, Anh (University of Arizona) | Cohen, Paul (University of Arizona)
This paper shows how to learn general, Finite State Machine representations of activities that function as recognizers of previously unseen instances of activities. The central problem is to tell which differences between instances of activities are unimportant and may be safely ignored for the purpose of learning generalized representations of activities. We develop a novel way to find the "essential parts" of activities by a greedy kind of multiple sequence alignment, and a method to transform the resulting alignments into Finite State Machine that will accept novel instances of activities with high accuracy.
Revisiting Numerical Pattern Mining with Formal Concept Analysis
Kaytoue, Mehdi (INRIA Nancy Grand Est - LORIA) | Kuznetsov, Sergei O. (Higher School of Economics - State University) | Napoli, Amedeo (CNRS)
We investigate the problem of mining numerical data with Formal Concept Analysis. The usual way is to use a scaling procedure —transforming numerical attributes into binary ones — leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and tested in an evaluation involving real-world data, showing the quality of the present approach.
Adaptation of a Mixture of Multivariate Bernoulli Distributions
Kamthe, Ankur (University of California, Merced) | Carreira-Perpinan, Miguel Angel (University of California, Merced) | Cerpa, Alberto E. (University of California, Merced)
The mixture of multivariate Bernoulli distributions (MMB) is a statistical model for high-dimensional binary data in widespread use. Recently, the MMB has been used to model the sequence of packet receptions and losses of wireless links in sensor networks. Given an MMB trained on long data traces recorded from links of a deployed network, one can then use samples from the MMB to test different routing algorithms for as long as desired. However, learning an accurate model for a new link requires collecting from it long traces over periods of hours, a costly process in practice (e.g. limited battery life). We propose an algorithm that can adapt a preexisting MMB trained with extensive data to a new link from which very limited data is available. Our approach constrains the new MMB's parameters through a nonlinear transformation of the existing MMB's parameters. The transformation has a small number of parameters that are estimated using a generalized EM algorithm with an inner loop of BFGS iterations. We demonstrate the efficacy of the approach using the MNIST dataset of handwritten digits, and wireless link data from a sensor network. We show we can learn accurate models from data traces of about 1 minute, about 10 times shorter than needed if training an MMB from scratch.
Heuristic Rule-Based Regression Via Dynamic Reduction to Classification
Janssen, Frederik (Technical University, Darmstadt) | Fürnkranz, Johannes (Technical University, Darmstadt)
In this paper, we propose a novel approach for learning regression rules by transforming the regression problem into a classification problem. Unlike previous approaches to regression by classification, in our approach the discretization of the class variable is tightly integrated into the rule learning algorithm. The key idea is to dynamically define a region around the target value predicted by the rule, and considering all examples within that region as positive and all examples outside that region as negative. In this way, conventional rule learning heuristics may be used for inducing regression rules. Our results show that our heuristic algorithm outperforms approaches that use a static discretization of the target variable, and performs en par with other comparable rule-based approaches, albeit without reaching the performance of statistical approaches.