Goto

Collaborating Authors

 Statistical Learning


Machine Learning, Clustering, and Polymorphy

arXiv.org Artificial Intelligence

This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed.


Uncertain Reasoning Using Maximum Entropy Inference

arXiv.org Artificial Intelligence

The use of maximum entropy inference in reasoning with uncertain information is commonly justified by an information-theoretic argument. This paper discusses a possible objection to this information-theoretic justification and shows how it can be met. I then compare maximum entropy inference with certain other currently popular methods for uncertain reasoning. In making such a comparison, one must distinguish between static and dynamic theories of degrees of belief: a static theory concerns the consistency conditions for degrees of belief at a given time; whereas a dynamic theory concerns how one's degrees of belief should change in the light of new information. It is argued that maximum entropy is a dynamic theory and that a complete theory of uncertain reasoning can be gotten by combining maximum entropy inference with probability theory, which is a static theory. This total theory, I argue, is much better grounded than are other theories of uncertain reasoning.


On Sparsity Inducing Regularization Methods for Machine Learning

arXiv.org Machine Learning

During the past years there has been an explosion of interest in learning methods based on sparsity regularization. In this paper, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function $\omega$ with a linear function. This setting includes several methods such the group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function $\omega$ is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.


Poisoning Attacks against Support Vector Machines

arXiv.org Machine Learning

We investigate a family of poisoning attacks against Support Vector Machines (SVM). Such attacks inject specially crafted training data that increases the SVM's test error. Central to the motivation for these attacks is the fact that most learning algorithms assume that their training data comes from a natural or well-behaved distribution. However, this assumption does not generally hold in security-sensitive settings. As we demonstrate, an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data. The proposed attack uses a gradient ascent strategy in which the gradient is computed based on properties of the SVM's optimal solution. This method can be kernelized and enables the attack to be constructed in the input space even for non-linear kernels. We experimentally demonstrate that our gradient ascent procedure reliably identifies good local maxima of the non-convex validation error surface, which significantly increases the classifier's test error.


Convex Tensor Decomposition via Structured Schatten Norm Regularization

arXiv.org Machine Learning

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity. Based on the properties of the structured Schatten norms, we mathematically analyze the performance of "latent" approach for tensor decomposition, which was empirically found to perform better than the "overlapped" approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific mode, this approach performs as good as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structures Schatten norms, establish the consistency, and discuss the identifiability of this approach. We confirm through numerical simulations that our theoretical prediction can precisely predict the scaling behaviour of the mean squared error.


Regression for sets of polynomial equations

arXiv.org Machine Learning

We propose a method called ideal regression for approximating an arbitrary system of polynomial equations by a system of a particular type. Using techniques from approximate computational algebraic geometry, we show how we can solve ideal regression directly without resorting to numerical optimization. Ideal regression is useful whenever the solution to a learning problem can be described by a system of polynomial equations. As an example, we demonstrate how to formulate Stationary Subspace Analysis (SSA), a source separation problem, in terms of ideal regression, which also yields a consistent estimator for SSA. We then compare this estimator in simulations with previous optimization-based approaches for SSA.


Generalizing k-means for an arbitrary distance matrix

arXiv.org Machine Learning

The original k-means clustering method works only if the exact vectors representing the data points are known. Therefore calculating the distances from the centroids needs vector operations, since the average of abstract data points is undefined. Existing algorithms can be extended for those cases when the sole input is the distance matrix, and the exact representing vectors are unknown. This extension may be named relational k-means after a notation for a similar algorithm invented for fuzzy clustering. A method is then proposed for generalizing k-means for scenarios when the data points have absolutely no connection with a Euclidean space.


On Learnability, Complexity and Stability

arXiv.org Machine Learning

A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly speaking, a hypotheses space is learnable if there is a consistent learning algorithm, i.e. one returning an optimal solution as the number of sample goes to infinity. Classic results for supervised learning characterize learnability of a function class in terms of its complexity (combinatorial dimension) [17, 16, 1, 2, 9, 3]. Indeed, minimization of the empirical risk on a function class having finite complexity can be shown to be consistent. A key aspect in this approach is the connection with empirical process theory results showing that finite combinatorial dimensions characterize function classes for which a uniform law of large numbers holds, namely uniform Glivenko-Cantelli classes [7].


A Diffusion Process on Riemannian Manifold for Visual Tracking

arXiv.org Machine Learning

Robust visual tracking for long video sequences is a research area that has many important applications. The main challenges include how the target image can be modeled and how this model can be updated. In this paper, we model the target using a covariance descriptor, as this descriptor is robust to problems such as pixel-pixel misalignment, pose and illumination changes, that commonly occur in visual tracking. We model the changes in the template using a generative process. We introduce a new dynamical model for the template update using a random walk on the Riemannian manifold where the covariance descriptors lie in. This is done using log-transformed space of the manifold to free the constraints imposed inherently by positive semidefinite matrices. Modeling template variations and poses kinetics together in the state space enables us to jointly quantify the uncertainties relating to the kinematic states and the template in a principled way. Finally, the sequential inference of the posterior distribution of the kinematic states and the template is done using a particle filter. Our results shows that this principled approach can be robust to changes in illumination, poses and spatial affine transformation. In the experiments, our method outperformed the current state-of-the-art algorithm - the incremental Principal Component Analysis method, particularly when a target underwent fast poses changes and also maintained a comparable performance in stable target tracking cases.


Robust and Trend Following Student's t Kalman Smoothers

arXiv.org Machine Learning

We present a Kalman smoothing framework based on modeling errors using the heavy tailed Student's t distribution, along with algorithms, convergence theory, open-source general implementation, and several important applications. The computational effort per iteration grows linearly with the length of the time series, and all smoothers allow nonlinear process and measurement models. Robust smoothers form an important subclass of smoothers within this framework. These smoothers work in situations where measurements are highly contaminated by noise or include data unexplained by the forward model. Highly robust smoothers are developed by modeling measurement errors using the Student's t distribution, and outperform the recently proposed L1-Laplace smoother in extreme situations with data containing 20% or more outliers. A second special application we consider in detail allows tracking sudden changes in the state. It is developed by modeling process noise using the Student's t distribution, and the resulting smoother can track sudden changes in the state. These features can be used separately or in tandem, and we present a general smoother algorithm and open source implementation, together with convergence analysis that covers a wide range of smoothers. A key ingredient of our approach is a technique to deal with the non-convexity of the Student's t loss function. Numerical results for linear and nonlinear models illustrate the performance of the new smoothers for robust and tracking applications, as well as for mixed problems that have both types of features.