Goto

Collaborating Authors

 Statistical Learning


Classification with Invariant Scattering Representations

arXiv.org Machine Learning

A scattering transform defines a signal representation which is invariant to translations and Lipschitz continuous relatively to deformations. It is implemented with a non-linear convolution network that iterates over wavelet and modulus operators. Lipschitz continuity locally linearizes deformations. Complex classes of signals and textures can be modeled with low-dimensional affine spaces, computed with a PCA in the scattering domain. Classification is performed with a penalized model selection. State of the art results are obtained for handwritten digit recognition over small training sets, and for texture classification.


Information-Maximization Clustering based on Squared-Loss Mutual Information

arXiv.org Machine Learning

Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve non-convex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative information-maximization clustering method based on a squared-loss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.


Adaptive Semisupervised Inference

arXiv.org Machine Learning

Semisupervised methods inevitably invoke some assumption that links the marginal distribution of the features to the regression function of the label. Most commonly, the cluster or manifold assumptions are used which imply that the regression function is smooth over high-density clusters or manifolds supporting the data. A generalization of these assumptions is that the regression function is smooth with respect to some density sensitive distance. This motivates the use of a density based metric [Bousquet et al., 2004, Coif-man and Lafon, 2006, Sajama and Orlitsky, 2005] for semisupervised learning. We analyze this setting and make the following contributions - (a) we propose a semi-supervised learner that uses a density-sensitive kernel and show that it provides better performance than any supervised learner if the density support set has a small condition number and (b) we show that it is possible to adapt to the degree of semi-supervisedness using data-dependent choice of a parameter that controls sensitivity of the distance metric to the density. This ensures that the semisupervised learner never performs worse than a supervised learner even if the assumptions fail to hold.


Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey

arXiv.org Machine Learning

Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: - Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. - Graphical models provide additional insights into the properties of the model, including the conditional independence properties. - Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.


Information, learning and falsification

arXiv.org Machine Learning

There are (at least) three approaches to quantifying information. The first, algorithmic information or Kolmogorov complexity, takes events as strings and, given a universal Turing machine, quantifies the information content of a string as the length of the shortest program producing it. The second, Shannon information, takes events as belonging to ensembles and quantifies the information resulting from observing the given event in terms of the number of alternate events that have been ruled out. The third, statistical learning theory, has introduced measures of capacity that control (in part) the expected risk of classifiers. These capacities quantify the expectations regarding future data that learning algorithms embed into classifiers. This note describes a new method of quantifying information, effective information, that links algorithmic information to Shannon information, and also links both to capacities arising in statistical learning theory. After introducing the measure, we show that it provides a non-universal analog of Kolmogorov complexity. We then apply it to derive basic capacities in statistical learning theory: empirical VC-entropy and empirical Rademacher complexity. A nice byproduct of our approach is an interpretation of the explanatory power of a learning algorithm in terms of the number of hypotheses it falsifies, counted in two different ways for the two capacities. We also discuss how effective information relates to information gain, Shannon and mutual information.


Spectral clustering based on local linear approximations

arXiv.org Machine Learning

In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a prototype for a higher-order spectral clustering method based on the residual from a local linear approximation. We obtain theoretical guarantees for this algorithm and show that, in terms of both separation and robustness to outliers, it outperforms the standard spectral clustering algorithm (based on pairwise distances) of Ng, Jordan and Weiss (NIPS '01). The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our theoretical purposes. We also discuss the cases of clusters of mixed dimensions and of clusters that are generated from smoother surfaces. In our experiments, this algorithm is shown to outperform pairwise spectral clustering on both simulated and real data.


A kernel-based framework for learning graded relations from data

arXiv.org Machine Learning

Driven by a large number of potential applications in areas like bioinformatics, information retrieval and social network analysis, the problem setting of inferring relations between pairs of data objects has recently been investigated quite intensively in the machine learning community. To this end, current approaches typically consider datasets containing crisp relations, so that standard classification methods can be adopted. However, relations between objects like similarities and preferences are often expressed in a graded manner in real-world applications. A general kernel-based framework for learning relations from data is introduced here. It extends existing approaches because both crisp and graded relations are considered, and it unifies existing approaches because different types of graded relations can be modeled, including symmetric and reciprocal relations. This framework establishes important links between recent developments in fuzzy set theory and machine learning. Its usefulness is demonstrated through various experiments on synthetic and real-world data.


3D Model Retrieval Based on Semantic and Shape Indexes

arXiv.org Artificial Intelligence

The size of 3D models used on the web or stored in databases is becoming increasingly high. Then, an efficient method that allows users to find similar 3D objects for a given 3D model query has become necessary. Keywords and the geometry of a 3D model cannot meet the needs of users' retrieval because they do not include the semantic information. In this paper, a new method has been proposed to 3D models retrieval using semantic concepts combined with shape indexes. To obtain these concepts, we use the machine learning methods to label 3D models by k-means algorithm in measures and shape indexes space. Moreover, semantic concepts have been organized and represented by ontology language OWL and spatial relationships are used to disambiguate among models of similar appearance. The SPARQL query language has been used to question the information displayed in this language and to compute the similarity between two 3D models. We interpret our results using the Princeton Shape Benchmark Database and the results show the performance of the proposed new approach to retrieval 3D models. Keywords: 3D Model, 3D retrieval, measures, shape indexes, semantic, ontology


Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

arXiv.org Machine Learning

In areas such as search, matching, retrieval and general data analysis, massive increase in data requires new methods that can cope well with the explosion in volume and dimensionality of the available data. In this work, the Baire metric, which is furthermore an ultrametric, is used to induce a hierarchy and in turn to support clustering, matching and other operations. Arising directly out of the Baire distance is an ultrametric tree, which also can be seen as a tree that hierarchically clusters data. This presents a number of advantages when storing and retrieving data. When the data source is in numerical form this ultrametric tree can be used as an index structure making matching and search, and thus retrieval, much easier. The clusters can be associated with hash keys, that is to say, the cluster members can be mapped onto "bins" or "buckets".


Optimal exponential bounds on the accuracy of classification

arXiv.org Machine Learning

We consider a standard binary classification problem. The performance of any binary classifier based on the training data is characterized by the excess risk. We study Bahadur's type exponential bounds on the minimax accuracy confidence function based on the excess risk. We study how this quantity depends on the complexity of the class of distributions characterized by exponents of entropies of the class of regression functions or of the class of Bayes classifiers corresponding to the distributions from the class. We also study its dependence on margin parameters of the classification problem.