Technology
Proximity Graphs for Clustering and Manifold Learning
Zemel, Richard S., Carreira-Perpiñán, Miguel Á.
Many machine learning algorithms for clustering or dimensionality reduction takeas input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering)or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph,a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces abetter representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reductionalgorithm based on the graph.
Dependent Gaussian Processes
Gaussian processes are usually parameterised in terms of their covariance functions.However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise thekernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs.
Convergence and No-Regret in Multiagent Learning
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment isno longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner's particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluationcriteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games.We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical resultsin a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).
Markov Networks for Detecting Overalpping Elements in Sequence Data
Craven, Mark, Bockhorst, Joseph
Many sequential prediction tasks involve locating instances of patterns insequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally represent arbitrarilyoverlapping elements. We show how to efficiently train and perform inference with these models. Experimental results froma genomics domain show that our models are more accurate at locating instances of overlapping patterns than are baseline models based on HMMs.
Hierarchical Distributed Representations for Statistical Language Modeling
Blitzer, John, Pereira, Fernando, Weinberger, Kilian Q., Saul, Lawrence K.
Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution overpredicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary anda training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.
Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis
Blaschke, Tobias, Wiskott, Laurenz
In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original sourcesignal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives ofstatistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. Theprinciple of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data.
Support Vector Classification with Input Data Uncertainty
This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty ininput data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input.
At the Edge of Chaos: Real-time Computations and Self-Organized Criticality in Recurrent Neural Networks
Bertschinger, Nils, Natschläger, Thomas, Legenstein, Robert A.
In this paper we analyze the relationship between the computational capabilities ofrandomly connected networks of threshold gates in the timeseries domain and their dynamical properties. In particular we propose a complexity measure which we find to assume its highest values near the edge of chaos, i.e. the transition from ordered to chaotic dynamics. Furthermore we show that the proposed complexity measure predicts the computational capabilities very well: only near the edge of chaos are such networks able to perform complex computations on time series. Additionally asimple synaptic scaling rule for self-organized criticality is presented and analyzed.