Goto

Collaborating Authors

 Statistical Learning


Semi-Supervised Learning Using Sparse Eigenfunction Bases

AAAI Conferences

We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the cluster assumption holds, that is, when the high density regions are suf๏ฌciently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classi๏ฌcation functions. By ๏ฌrst choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classi๏ฌer in the span of these eigenvectors, we obtain a classi๏ฌer, which has a very sparse representation in this basis. Importantly, the sparsity appears naturally from the cluster assumption. Experimental results on a number of real-world datasets show that our method is competitive with the state of the art semi-supervised learning algorithms and out-performs the natural base-line algorithm (Lasso in the Kernel PCA basis).


MiPPS: A Generative Model for Multi-Manifold Clustering

AAAI Conferences

We propose a generative model for high dimensional data consisting of intrinsically low dimensional clusters that are noisily sampled. The proposed model is a mixture of probabilistic principal surfaces (MiPPS) optimized using expectation maximization. We use a Bayesian prior on the model parameters to maximize the corresponding marginal likelihood. We also show empirically that this optimization can be biased towards a good local optimum by using our prior intuition to guide the initialization phase The proposed unsupervised algorithm naturally handles cases where the data lies on multiple connected components of a single manifold and where the component manifolds intersect. In addition to clustering, we learn a functional model for the underlying structure of each component cluster as a parameterized hyper-surface in ambient noise.This model is used to learn a global embedding that we use for visualization of the entire dataset. We demonstrate the performance of MiPPS in separating and visualizing land cover types in a hyperspectral dataset.


Interactive Learning Using Manifold Geometry

AAAI Conferences

We present an interactive learning method that enables a user to iteratively refine a regression model. The user examines the output of the model, visualized as the vertical axis of a 2D scatterplot, and provides corrections by repositioning individual data points to the correct output level. Each repositioned data point acts as a control point for altering the learned model, using the geometry underlying the data. We capture the underlying structure of the data as a manifold, on which we compute a set of basis functions as the foundation for learning. Our results show that manifold-based interactive learning achieves dramatic improvement over alternative approaches.


Predicting and Controlling System-Level Parameters of Multi-Agent Systems

AAAI Conferences

Boid flocking is a system in which several individual agents follow three simple rules to generate swarm-level flocking behavior. To control this system, the user must adjust the agent program parameters, which indirectly modifies the flocking behavior. This is unintuitive because the properties of the flocking behavior are non-explicit in the agent program. In this paper, we discuss a domain-independent approach for detecting and controlling two emergent properties of boids: density and a qualitative threshold effect of swarming vs. flocking. Also, we discuss the possibility of applying this approach to detecting and controlling traffic jams in traffic simulations.


ParamILS: An Automatic Algorithm Configuration Framework

Journal of Artificial Intelligence Research

The identification of performance-optimizing parameter settings is an important part of the development and application of algorithms. We describe an automatic framework for this algorithm configuration problem. More formally, we provide methods for optimizing a target algorithms performance on a given class of problem instances by varying a set of ordinal and/or categorical parameters. We review a family of local-search-based algorithm configuration procedures and present novel techniques for accelerating them by adaptively limiting the time spent for evaluating individual configurations. We describe the results of a comprehensive experimental evaluation of our methods, based on the configuration of prominent complete and incomplete algorithms for SAT. We also present what is, to our knowledge, the first published work on automatically configuring the CPLEX mixed integer programming solver. All the algorithms we considered had default parameter settings that were manually identified with considerable effort. Nevertheless, using our automated algorithm configuration procedures, we achieved substantial and consistent performance improvements.


Content Modeling Using Latent Permutations

Journal of Artificial Intelligence Research

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.


Sparsification and feature selection by compressive linear regression

arXiv.org Machine Learning

The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which we address in this study. As in other formulations such as the LASSO and forward step-wise regression we are interested in sparsifying the feature set while preserving generalization ability. We derive a well-principled set of codes for both parameters and error residuals along with smooth approximations to lengths of these codes as to allow gradient descent optimization of description length, and go on to show that sparsification and feature selection using our approach is faster than the LASSO on several datasets from the UCI and StatLib repositories, with favorable generalization accuracy, while being fully automatic, requiring neither cross-validation nor tuning of regularization hyper-parameters, allowing even for a nonlinear expansion of the feature set followed by sparsification.


YQX Plays Chopin

AI Magazine

The article is about AI research in the context of a complex artistic behavior: expressive music performance. A computer program is presented that learns to play piano with 'expression' and that even won an international computer piano performance contest. A superficial analysis of an expressive performance generated by the system seems to suggest creative musical abilities. After a critical discussion of the processes underlying this behavior, we abandon the question of whether the system is really creative, and turn to the true motivation that drives this research: to use AI methods to investigate and better understand music performance as a human creative behavior. A number of recent and current results from our research are briefly presented that indicate that machines can give us interesting insights into such a complex creative behavior, even if they may not be creative themselves.


Algorithms for Image Analysis and Combination of Pattern Classifiers with Application to Medical Diagnosis

arXiv.org Artificial Intelligence

Medical Informatics and the application of modern signal processing in the assistance of the diagnostic process in medical imaging is one of the more recent and active research areas today. This thesis addresses a variety of issues related to the general problem of medical image analysis, specifically in mammography, and presents a series of algorithms and design approaches for all the intermediate levels of a modern system for computer-aided diagnosis (CAD). The diagnostic problem is analyzed with a systematic approach, first defining the imaging characteristics and features that are relevant to probable pathology in mammo-grams. Next, these features are quantified and fused into new, integrated radio-logical systems that exhibit embedded digital signal processing, in order to improve the final result and minimize the radiological dose for the patient. In a higher level, special algorithms are designed for detecting and encoding these clinically interest-ing imaging features, in order to be used as input to advanced pattern classifiers and machine learning models. Finally, these approaches are extended in multi-classifier models under the scope of Game Theory and optimum collective deci-sion, in order to produce efficient solutions for combining classifiers with minimum computational costs for advanced diagnostic systems. The material covered in this thesis is related to a total of 18 published papers, 6 in scientific journals and 12 in international conferences.


$L_0$ regularized estimation for nonlinear models that have sparse underlying linear structures

arXiv.org Machine Learning

We study the estimation of $\beta$ for the nonlinear model $y = f(X\sp{\top}\beta) + \epsilon$ when $f$ is a nonlinear transformation that is known, $\beta$ has sparse nonzero coordinates, and the number of observations can be much smaller than that of parameters ($n\ll p$). We show that in order to bound the $L_2$ error of the $L_0$ regularized estimator $\hat\beta$, i.e., $\|\hat\beta - \beta\|_2$, it is sufficient to establish two conditions. Based on this, we obtain bounds of the $L_2$ error for (1) $L_0$ regularized maximum likelihood estimation (MLE) for exponential linear models and (2) $L_0$ regularized least square (LS) regression for the more general case where $f$ is analytic. For the analytic case, we rely on power series expansion of $f$, which requires taking into account the singularities of $f$.