Goto

Collaborating Authors

 Genre


On the conditions used to prove oracle results for the Lasso

arXiv.org Machine Learning

Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al., 2007b,c) or restricted isometry (Candès and Tao, 2005) assumptions. Keywords and phrases: Coherence, compatibility, irrepresentable condition, Lasso, restricted eigenvalue, restricted isometry, sparsity.


Pre-processing in AI based Prediction of QSARs

arXiv.org Artificial Intelligence

Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.


A path algorithm for the Fused Lasso Signal Approximator

arXiv.org Machine Learning

The Lasso is a very well known penalized regression model, which adds an $L_{1}$ penalty with parameter $\lambda_{1}$ on the coefficients to the squared error loss function. The Fused Lasso extends this model by also putting an $L_{1}$ penalty with parameter $\lambda_{2}$ on the difference of neighboring coefficients, assuming there is a natural ordering. In this paper, we develop a fast path algorithm for solving the Fused Lasso Signal Approximator that computes the solutions for all values of $\lambda_1$ and $\lambda_2$. In the supplement, we also give an algorithm for the general Fused Lasso for the case with predictor matrix $\bX \in \mathds{R}^{n \times p}$ with $\text{rank}(\bX)=p$.


Expectation Propagation on the Maximum of Correlated Normal Variables

arXiv.org Machine Learning

Many inference problems involving questions of optimality ask for the maximum or the minimum of a finite set of unknown quantities. This technical report derives the first two posterior moments of the maximum of two correlated Gaussian variables and the first two posterior moments of the two generating variables (corresponding to Gaussian approximations minimizing relative entropy). It is shown how this can be used to build a heuristic approximation to the maximum relationship over a finite set of Gaussian variables, allowing approximate inference by Expectation Propagation on such quantities.


Laplacian Support Vector Machines Trained in the Primal

arXiv.org Machine Learning

In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.


Initialization Free Graph Based Clustering

arXiv.org Machine Learning

This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are then computed in order to initialize the generalized Lloyd's algorithm, also known as $K$-means, which allows to circumvent initialization problems. Some results are derived for evaluating the false positive rate of our cluster detection algorithm, with the help of approximations relevant in Euclidean spaces. Metrics used for measuring similarity between multi-dimensional data points are based on symmetrical divergences. The use of these informational divergences together with the proposed method leads to better results, compared to other clustering methods for the problem of astrophysical data processing. Some applications of this method in the multi/hyper-spectral imagery domain to a satellite view of Paris and to an image of the Mars planet are also presented. In order to demonstrate the usefulness of divergences in our problem, the method with informational divergence as similarity measure is compared with the same method using classical metrics. In the astrophysics application, we also compare the method with the spectral clustering algorithms.


Telling cause from effect based on high-dimensional observations

arXiv.org Machine Learning

We describe a method for inferring linear causal relations among multi-dimensional variables. The idea is to use an asymmetry between the distributions of cause and effect that occurs if both the covariance matrix of the cause and the structure matrix mapping cause to the effect are independently chosen. The method works for both stochastic and deterministic causal relations, provided that the dimensionality is sufficiently high (in some experiments, 5 was enough). It is applicable to Gaussian as well as non-Gaussian data.


Discrete MDL Predicts in Total Variation

arXiv.org Machine Learning

The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.


Dealing with incomplete agents' preferences and an uncertain agenda in group decision making via sequential majority voting

arXiv.org Artificial Intelligence

We consider multi-agent systems where agents' preferences are aggregated via sequential majority voting: each decision is taken by performing a sequence of pairwise comparisons where each comparison is a weighted majority vote among the agents. Incompleteness in the agents' preferences is common in many real-life settings due to privacy issues or an ongoing elicitation process. In addition, there may be uncertainty about how the preferences are aggregated. For example, the agenda (a tree whose leaves are labelled with the decisions being compared) may not yet be known or fixed. We therefore study how to determine collectively optimal decisions (also called winners) when preferences may be incomplete, and when the agenda may be uncertain. We show that it is computationally easy to determine if a candidate decision always wins, or may win, whatever the agenda. On the other hand, it is computationally hard to know wheth er a candidate decision wins in at least one agenda for at least one completion of the agents' preferences. These results hold even if the agenda must be balanced so that each candidate decision faces the same number of majority votes. Such results are useful for reasoning about preference elicitation. They help understand the complexity of tasks such as determining if a decision can be taken collectively, as well as knowing if the winner can be manipulated by appropriately ordering the agenda.


A Semantics for HTN Methods

AAAI Conferences

Despite the extensive development of first-principles planning in recent years, planning applications are still primarily developed using knowledge-based planners which can exploit domain-specific heuristics and weaker domain models.  Hierarchical Task Network (HTN) planners capture domain-specific heuristics for more efficient search, accommodate incomplete causal models, and can be used to enforce standard operating procedures.  Unfortunately, we do not have semantics for the methods or tasks that make up HTN models, that help evaluate the correctness of methods, or to build a reliable executive for HTN plans.  This paper fills the gap by providing a well-defined semantics for the methods and plans of SHOP2, a state-of-the-art HTN planner.  The semantics are defined in terms of concurrent golog (ConGolog) and the situation calculus.  We provide a proof of equivalence between the plans generated by SHOP2 and the action sequences of the ConGolog semantics.  We show how the semantics reflects the distinction between plan-time and execution-time, and provide some simple examples showing how the semantics can support method verification.  The semantics provide an implementation-neutral specification for an executive, showing how an executive must treat the plans SHOP2 generates in order to enforce the expected behaviors.  Future directions include automated verification of method specifications, automatically generating plan monitors, and plan revision and repair.