Goto

Collaborating Authors

 Technology


Model-Consistent Sparse Estimation through the Bootstrap

arXiv.org Machine Learning

We consider the least-square linear regression problem with regularization by the $\ell^1$-norm, a problem usually referred to as the Lasso. In this paper, we first present a detailed asymptotic analysis of model consistency of the Lasso in low-dimensional settings. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection. For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection procedure, referred to as the Bolasso, is extended to high-dimensional settings by a provably consistent two-step procedure.


On finitely recursive programs

arXiv.org Artificial Intelligence

Disjunctive finitary programs are a class of logic programs admitting function symbols and hence infinite domains. They have very good computational properties, for example ground queries are decidable while in the general case the stable model semantics is highly undecidable. In this paper we prove that a larger class of programs, called finitely recursive programs, preserves most of the good properties of finitary programs under the stable model semantics, namely: (i) finitely recursive programs enjoy a compactness property; (ii) inconsistency checking and skeptical reasoning are semidecidable; (iii) skeptical resolution is complete for normal finitely recursive programs. Moreover, we show how to check inconsistency and answer skeptical queries using finite subsets of the ground program instantiation. We achieve this by extending the splitting sequence theorem by Lifschitz and Turner: We prove that if the input program P is finitely recursive, then the partial stable models determined by any smooth splitting omega-sequence converge to a stable model of P.


State Space Realization Theorems For Data Mining

arXiv.org Machine Learning

In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for these formal series using the language and tools of Hopf algebras.


Sparse Causal Discovery in Multivariate Time Series

arXiv.org Machine Learning

Our goal is to estimate causal interactions in multivariate time series. Using vector autoregressive (VAR) models, these can be defined based on non-vanishing coefficients belonging to respective time-lagged instances. As in most cases a parsimonious causality structure is assumed, a promising approach to causal discovery consists in fitting VAR models with an additional sparsity-promoting regularization. Along this line we here propose that sparsity should be enforced for the subgroups of coefficients that belong to each pair of time series, as the absence of a causal relation requires the coefficients for all time-lags to become jointly zero. Such behavior can be achieved by means of l1-l2-norm regularized regression, for which an efficient active set solver has been proposed recently. Our method is shown to outperform standard methods in recovering simulated causality graphs. The results are on par with a second novel approach which uses multiple statistical testing.


On Introspection, Metacognitive Control and Augmented Data Mining Live Cycles

arXiv.org Artificial Intelligence

We discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models.


Foundations of a Multi-way Spectral Clustering Framework for Hybrid Linear Modeling

arXiv.org Machine Learning

The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem, and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu's multi-way spectral clustering framework (CVPR 2005) and Ng et al.'s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001).


Differential Privacy with Compression

arXiv.org Machine Learning

This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Our goal is to show that, despite the general difficulty of achieving the differential privacy guarantee, it is possible to publish synthetic data that are useful for a number of common statistical learning applications. This includes high dimensional sparse regression (Zhou et al. 07), principal component analysis (PCA), and other statistical measures (Liu et al. 06) based on the covariance of the initial data.


PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

arXiv.org Machine Learning

The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection.


Logical Algorithms meets CHR: A meta-complexity result for Constraint Handling Rules with rule priorities

arXiv.org Artificial Intelligence

This paper investigates the relationship between the Logical Algorithms language (LA) of Ganzinger and McAllester and Constraint Handling Rules (CHR). We present a translation schema from LA to CHR-rp: CHR with rule priorities, and show that the meta-complexity theorem for LA can be applied to a subset of CHR-rp via inverse translation. Inspired by the high-level implementation proposal for Logical Algorithm by Ganzinger and McAllester and based on a new scheduling algorithm, we propose an alternative implementation for CHR-rp that gives strong complexity guarantees and results in a new and accurate meta-complexity theorem for CHR-rp. It is furthermore shown that the translation from Logical Algorithms to CHR-rp combined with the new CHR-rp implementation, satisfies the required complexity for the Logical Algorithms meta-complexity result to hold.


N-norm and N-conorm in Neutrosophic Logic and Set, and the Neutrosophic Topologies

arXiv.org Artificial Intelligence

In this paper we present the N-norms/N-conorms in neutrosophic logic and set as extensions of T-norms/T-conorms in fuzzy logic and set. Also, as an extension of the Intuitionistic Fuzzy Topology we present the Neutrosophic Topologies.