AITopics | arXiv.org Machine Learning

Collaborating Authors

arXiv.org Machine Learning

P-values for high-dimensional regression

Meinshausen, Nicolai, Meier, Lukas, Bühlmann, Peter

arXiv.org Machine LearningJun-12-2009

Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.

artificial intelligence, health & medicine, multi-split method, (17 more...)

arXiv.org Machine Learning

0811.2177

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

Regularization methods for learning incomplete matrices

Mazumder, Rahul, Hastie, Trevor, Tibshirani, Rob

arXiv.org Machine LearningJun-11-2009

We use convex relaxation techniques to provide a sequence of solutions to the matrix completion problem. Using the nuclear norm as a regularizer, we provide simple and very efficient algorithms for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm iteratively replaces the missing elements with those obtained from a thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions.

algorithm, artificial intelligence, health & medicine, (19 more...)

arXiv.org Machine Learning

0906.2034

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy

Murtagh, Fionn

arXiv.org Machine LearningJun-1-2009

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. "Structure" has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Beginning with the role of number theory in expressing data, we show how we can naturally proceed to hierarchical structures. We show how this both encapsulates traditional paradigms in data analysis, and also opens up new perspectives towards issues that are on the order of the day, including data mining of massive, high dimensional, heterogeneous data sets. Linkages with other fields are also discussed including computational logic and symbolic dynamics. The structures in data surveyed here are based on hierarchy, represented as p-adic numbers or an ultrametric topology.

dendrogram, health & medicine, survey article, (21 more...)

arXiv.org Machine Learning

doi: 10.1134/S0081543809020175

0805.2744

Country:

Europe > United Kingdom > England (0.14)
North America > United States > New York (0.14)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks

Harrington, Patrick L. Jr., Hero, Alfred O. III

arXiv.org Machine LearningMay-13-2009

Percolation on complex networks has been used to study computer viruses, epidemics, and other casual processes. Here, we present conditions for the existence of a network specific, observation dependent, phase transition in the updated posterior of node states resulting from actively monitoring the network. Since traditional percolation thresholds are derived using observation independent Markov chains, the threshold of the posterior should more accurately model the true phase transition of a network, as the updated posterior more accurately tracks the process. These conditions should provide insight into modeling the dynamic response of the updated posterior to active intervention and control policies while monitoring large complex networks.

artificial intelligence, health & medicine, threshold, (19 more...)

arXiv.org Machine Learning

0905.2236

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A more robust boosting algorithm

Freund, Yoav

arXiv.org Machine LearningMay-13-2009

We present a new boosting algorithm, motivated by the large margins theory for boosting. We give experimental evidence that the new algorithm is significantly more robust against label noise than existing boosting algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

0905.2138

Country: North America > United States (0.28)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Deformed Statistics Formulation of the Information Bottleneck Method

Venkatesan, R. C., Plastino, A.

arXiv.org Machine LearningMay-1-2009

The theoretical basis for a candidate variational principle for the information bottleneck (IB) method is formulated within the ambit of the generalized nonadditive statistics of Tsallis. Given a nonadditivity parameter $ q $, the role of the \textit{additive duality} of nonadditive statistics ($ q^*=2-q $) in relating Tsallis entropies for ranges of the nonadditivity parameter $ q < 1 $ and $ q > 1 $ is described. Defining $ X $, $ \tilde X $, and $ Y $ to be the source alphabet, the compressed reproduction alphabet, and, the \textit{relevance variable} respectively, it is demonstrated that minimization of a generalized IB (gIB) Lagrangian defined in terms of the nonadditivity parameter $ q^* $ self-consistently yields the \textit{nonadditive effective distortion measure} to be the \textit{$ q $-deformed} generalized Kullback-Leibler divergence: $ D_{K-L}^{q}[p(Y|X)||p(Y|\tilde X)] $. This result is achieved without enforcing any \textit{a-priori} assumptions. Next, it is proven that the $q^*-deformed $ nonadditive free energy of the system is non-negative and convex. Finally, the update equations for the gIB method are derived. These results generalize critical features of the IB method to the case of Tsallis statistics.

artificial intelligence, machine learning, statistics, (15 more...)

arXiv.org Machine Learning

0811.3174

Country: North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multiscale Inference for High-Frequency Data

Olhede, Sofia, Sykulski, Adam, Pavliotis, Grigorios

arXiv.org Machine LearningApr-19-2009

This paper proposes a novel multiscale estimator for the integrated volatility of an Ito process, in the presence of market microstructure noise (observation error). The multiscale structure of the observed process is represented frequency-by-frequency and the concept of the multiscale ratio is introduced to quantify the bias in the realized integrated volatility due to the observation error. The multiscale ratio is estimated from a single sample path, and a frequency-by-frequency bias correction procedure is proposed, which simultaneously reduces variance. We extend the method to include correlated observation errors and provide the implied time domain form of the estimation procedure. The new method is implemented to estimate the integrated volatility for the Heston and other models, and the improved performance of our method over existing methods is illustrated by simulation studies.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

0803.0392

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding

Potscher, Benedikt M., Leeb, Hannes

arXiv.org Machine LearningApr-17-2009

We study the distributions of the LASSO, SCAD, and thresholding estimators, in finite samples and in the large-sample limit. The asymptotic distributions are derived for both the case where the estimators are tuned to perform consistent model selection and for the case where the estimators are tuned to perform conservative model selection. Our findings complement those of Knight and Fu (2000) and Fan and Li (2001). We show that the distributions are typically highly nonnormal regardless of how the estimator is tuned, and that this property persists in large samples. The uniform convergence rate of these estimators is also obtained, and is shown to be slower than 1/root(n) in case the estimator is tuned to perform consistent model selection. An impossibility result regarding estimation of the estimators' distribution function is also provided.

artificial intelligence, bayesian inference, estimator, (20 more...)

arXiv.org Machine Learning

doi: 10.1016/j.jmva.2009.06.010

0711.0660

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Add feedback

The Redundancy of a Computable Code on a Noncomputable Distribution

Dębowski, Łukasz

arXiv.org Machine LearningApr-10-2009

We introduce new definitions of universal and superuniversal computable codes, which are based on a code's ability to approximate Kolmogorov complexity within the prescribed margin for all individual sequences from a given set. Such sets of sequences may be singled out almost surely with respect to certain probability measures. Consider a measure parameterized with a real parameter and put an arbitrary prior on the parameter. The Bayesian measure is the expectation of the parameterized measure with respect to the prior. It appears that a modified Shannon-Fano code for any computable Bayesian measure, which we call the Bayesian code, is superuniversal on a set of parameterized measure-almost all sequences for prior-almost every parameter. According to this result, in the typical setting of mathematical statistics no computable code enjoys redundancy which is ultimately much less than that of the Bayesian code. Thus we introduce another characteristic of computable codes: The catch-up time is the length of data for which the code length drops below the Kolmogorov complexity plus the prescribed margin. Some codes may have smaller catch-up times than Bayesian codes.

artificial intelligence, bayesian code, machine learning, (16 more...)

arXiv.org Machine Learning

0901.2321

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Bayesian MAP Model Selection of Chain Event Graphs

Freeman, Guy, Smith, Jim Q.

arXiv.org Machine LearningApr-6-2009

The class of chain event graph models is a generalisation of the class of discrete Bayesian networks, retaining most of the structural advantages of the Bayesian network for model interrogation, propagation and learning, while more naturally encoding asymmetric state spaces and the order in which events happen. In this paper we demonstrate how with complete sampling, conjugate closed form model selection based on product Dirichlet priors is possible, and prove that suitable homogeneity assumptions characterise the product Dirichlet prior on this class of models. We demonstrate our techniques using two educational examples.

artificial intelligence, bayesian inference, ceg, (16 more...)

arXiv.org Machine Learning

0904.0977

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback