Goto

Collaborating Authors

 Genre


Challenges in Representation Learning: A report on three machine learning contests

arXiv.org Machine Learning

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.


Gaussian Process Conditional Copulas with Applications to Financial Time Series

arXiv.org Machine Learning

The estimation of dependencies between multiple variables is a central problem in the analysis of financial time series. A common approach is to express these dependencies in terms of a copula function. Typically the copula function is assumed to be constant but this may be inaccurate when there are covariates that could have a large influence on the dependence structure of the data. To account for this, a Bayesian framework for the estimation of conditional copulas is proposed. In this framework the parameters of a copula are non-linearly related to some arbitrary conditioning variables. We evaluate the ability of our method to predict time-varying dependencies on several equities and currencies and observe consistent performance gains compared to static copula models and other time-varying copula methods.


Dimensionality Detection and Integration of Multiple Data Sources via the GP-LVM

arXiv.org Machine Learning

The Gaussian Process Latent Variable Model (GP-LVM) is a non-linear probabilistic method of embedding a high dimensional dataset in terms low dimensional `latent' variables. In this paper we illustrate that maximum a posteriori (MAP) estimation of the latent variables and hyperparameters can be used for model selection and hence we can determine the optimal number or latent variables and the most appropriate model. This is an alternative to the variational approaches developed recently and may be useful when we want to use a non-Gaussian prior or kernel functions that don't have automatic relevance determination (ARD) parameters. Using a second order expansion of the latent variable posterior we can marginalise the latent variables and obtain an estimate for the hyperparameter posterior. Secondly, we use the GP-LVM to integrate multiple data sources by simultaneously embedding them in terms of common latent variables. We present results from synthetic data to illustrate the successful detection and retrieval of low dimensional structure from high dimensional data. We demonstrate that the integration of multiple data sources leads to more robust performance. Finally, we show that when the data are used for binary classification tasks we can attain a significant gain in prediction accuracy when the low dimensional representation is used.


A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank

arXiv.org Machine Learning

Rank minimization has attracted a lot of attention due to its robustness in data recovery. To overcome the computational difficulty, rank is often replaced with nuclear norm. For several rank minimization problems, such a replacement has been theoretically proven to be valid, i.e., the solution to nuclear norm minimization problem is also the solution to rank minimization problem. Although it is easy to believe that such a replacement may not always be valid, no concrete example has ever been found. We argue that such a validity checking cannot be done by numerical computation and show, by analyzing the noiseless latent low rank representation (LatLRR) model, that even for very simple rank minimization problems the validity may still break down. As a by-product, we find that the solution to the nuclear norm minimization formulation of LatLRR is non-unique. Hence the results of LatLRR reported in the literature may be questionable.


Syntactic sensitive complexity for symbol-free sequence

arXiv.org Artificial Intelligence

Complexity of the text has been developed with varying degrees of success, [2][3]. This work devised a novel measure based on L-system [1] that can compute the structural complexity of a text sequence. Given a text, we first transform it into a binary string. Then, use L-system to model the tree structure of this string and get its structural complexity. We will introduce how to use L-system to model the string in this section. The measure of complexity for the text sequence is included in the next section.


Solving Weighted Voting Game Design Problems Optimally: Representations, Synthesis, and Enumeration

arXiv.org Artificial Intelligence

We study the inverse power index problem for weighted voting games: the problem of finding a weighted voting game in which the power of the players is as close as possible to a certain target distribution. Our goal is to find algorithms that solve this problem exactly. Thereto, we study various subclasses of simple games, and their associated representation methods. We survey algorithms and impossibility results for the synthesis problem, i.e., converting a representation of a simple game into another representation. We contribute to the synthesis problem by showing that it is impossible to compute in polynomial time the list of ceiling coalitions (also known as shift-maximal losing coalitions) of a game from its list of roof coalitions (also known as shift-minimal winning coalitions), and vice versa. Then, we proceed by studying the problem of enumerating the set of weighted voting games. We present first a naive algorithm for this, running in doubly exponential time. Using our knowledge of the synthesis problem, we then improve on this naive algorithm, and we obtain an enumeration algorithm that runs in quadratic exponential time (that is, O(2^(n^2) p(n)) for a polynomial p). Moreover, we show that this algorithm runs in output-polynomial time, making it the best possible enumeration algorithm up to a polynomial factor. Finally, we propose an exact anytime algorithm for the inverse power index problem that runs in exponential time. This algorithm is straightforward and general: it computes the error for each game enumerated, and outputs the game that minimizes this error. By the genericity of our approach, our algorithm can be used to find a weighted voting game that optimizes any exponential time computable function. We implement our algorithm for the case of the normalized Banzhaf index, and we perform experiments in order to study performance and error convergence.


Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

arXiv.org Artificial Intelligence

Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.


Semi-supervised clustering methods

arXiv.org Machine Learning

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.


Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models

arXiv.org Machine Learning

We study sparse principal component analysis for high dimensional vector autoregressive time series under a doubly asymptotic framework, which allows the dimension $d$ to scale with the series length $T$. We treat the transition matrix of time series as a nuisance parameter and directly apply sparse principal component analysis on multivariate time series as if the data are independent. We provide explicit non-asymptotic rates of convergence for leading eigenvector estimation and extend this result to principal subspace estimation. Our analysis illustrates that the spectral norm of the transition matrix plays an essential role in determining the final rates. We also characterize sufficient conditions under which sparse principal component analysis attains the optimal parametric rate. Our theoretical results are backed up by thorough numerical studies.


Concentration and Confidence for Discrete Bayesian Sequence Predictors

arXiv.org Machine Learning

Bayesian sequence prediction is a simple technique for predicting future symbols sampled from an unknown measure on infinite sequences over a countable alphabet. While strong bounds on the expected cumulative error are known, there are only limited results on the distribution of this error. We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence. We also consider the problem of constructing upper confidence bounds on the KL and Hellinger errors similar to those constructed from Hoeffding-like bounds in the i.i.d. case. The new results are applied to show that Bayesian sequence prediction can be used in the Knows What It Knows (KWIK) framework with bounds that match the state-of-the-art.