AITopics | arXiv.org Machine Learning

Plotting

arXiv.org Machine Learning

Kernel Regression by Mode Calculation of the Conditional Probability Distribution

arXiv.org Machine LearningNov-21-2008

The most direct way to express arbitrary dependencies in datasets is to estimate the joint distribution and to apply afterwards the argmax-function to obtain the mode of the corresponding conditional distribution. This method is in practice difficult, because it requires a global optimization of a complicated function, the joint distribution by fixed input variables. This article proposes a method for finding global maxima if the joint distribution is modeled by a kernel density estimation. Some experiments show advantages and shortcomings of the resulting regression method in comparison to the standard Nadaraya-Watson regression technique, which approximates the optimum by the expectation value.

artificial intelligence, dependency, machine learning, (13 more...)

arXiv.org Machine Learning

0811.3499

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Add feedback

Random Forests: some methodological insights

Genuer, Robin, Poggi, Jean-Michel, Tuleau, Christine

arXiv.org Machine LearningNov-21-2008

This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.

decision tree learning, oncology, procedure, (21 more...)

arXiv.org Machine Learning

0811.3619

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence

Ravikumar, Pradeep, Wainwright, Martin J., Raskutti, Garvesh, Yu, Bin

arXiv.org Machine LearningNov-21-2008

Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance matrix $\Sigma^*$, and its inverse covariance or concentration matrix {$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$ by minimizing an $\ell_1$-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds to $\ell_1$-penalized maximum likelihood, and the structure of $\Theta^*$ is specified by the graph of an associated Gaussian Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of nodes in the graph $p$, the number of edges $s$ and the maximum node degree $d$, are allowed to grow as a function of the sample size $n$. In addition to the parameters $(p,s,d)$, our analysis identifies other key quantities covariance matrix $\Sigma^*$; and (b) the $\ell_\infty$ operator norm of the sub-matrix $\Gamma^*_{S S}$, where $S$ indexes the graph edges, and $\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}$; and (c) a mutual incoherence or irrepresentability measure on the matrix $\Gamma^*$ and (d) the rate of decay $1/f(n,\delta)$ on the probabilities $ \{|\hat{\Sigma}^n_{ij}- \Sigma^*_{ij}| > \delta \}$, where $\hat{\Sigma}^n$ is the sample covariance based on $n$ samples. Our first result establishes consistency of our estimate $\hat{\Theta}$ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $d = o(\sqrt{s})$. In our second result, we show that with probability converging to one, the estimate $\hat{\Theta}$ correctly specifies the zero pattern of the concentration matrix $\Theta^*$.

artificial intelligence, machine learning, matrix, (19 more...)

arXiv.org Machine Learning

0811.3628

Country:

Europe (0.67)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

Inference with Discriminative Posterior

Salojärvi, Jarkko, Puolamäki, Kai, Savia, Eerika, Kaski, Samuel

arXiv.org Machine LearningNov-18-2008

We study Bayesian discriminative inference given a model family $p(c,\x, \theta)$ that is assumed to contain all our prior information but still known to be incorrect. This falls in between "standard" Bayesian generative modeling and Bayesian regression, where the margin $p(\x,\theta)$ is known to be uninformative about $p(c|\x,\theta)$. We give an axiomatic proof that discriminative posterior is consistent for conditional inference; using the discriminative posterior is standard practice in classical Bayesian regression, but we show that it is theoretically justified for model families of joint densities as well. A practical benefit compared to Bayesian regression is that the standard methods of handling missing values in generative modeling can be extended into discriminative inference, which is useful if the amount of data is small. Compared to standard generative modeling, discriminative posterior results in better conditional inference if the model family is incorrect. If the model family contains also the true model, the discriminative posterior gives the same result as standard Bayesian generative modeling. Practical computation is done with Markov chain Monte Carlo.

artificial intelligence, bayesian inference, posterior, (16 more...)

arXiv.org Machine Learning

0807.3470

Country:

Europe (1.00)
North America > United States > New York (0.14)

Genre: Research Report (0.85)

Add feedback

A Multivariate Regression Approach to Association Analysis of Quantitative Trait Network

Kim, Seyoung, Sohn, Kyung-Ah, Xing, Eric P.

arXiv.org Machine LearningNov-12-2008

Many complex disease syndromes such as asthma consist of a large number of highly related, rather than independent, clinical phenotypes, raising a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to address this issue in a principled way. Our approach explicitly represents the dependency structure among the quantitative traits as a network, and leverages this trait network to encode structured regularizations in a multivariate regression model over the genotypes and traits, so that the genetic markers that jointly influence subgroups of highly correlated traits can be detected with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical method, and borrow information across correlated phenotypes to discover the genetic markers that perturbe a subset of correlated triats jointly rather than a single trait. Using simulated datasets based on the HapMap consortium data and an asthma dataset, we compare the performance of our method with the single-marker analysis, and other sparse regression methods such as the ridge regression and the lasso that do not use any structural information in the traits. Our results show that there is a significant advantage in detecting the true causal SNPs when we incorporate the correlation pattern in traits using our proposed methods.

health & medicine, immunology, phenotype, (17 more...)

arXiv.org Machine Learning

0811.2026

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Improved Estimation of High-dimensional Ising Models

Kolar, M., Xing, E. P.

arXiv.org Machine LearningNov-7-2008

We consider the problem of jointly estimating the parameters as well as the structure of binary valued Markov Random Fields, in contrast to earlier work that focus on one of the two problems. We formulate the problem as a maximization of $\ell_1$-regularized surrogate likelihood that allows us to find a sparse solution. Our optimization technique efficiently incorporates the cutting-plane algorithm in order to obtain a tighter outer bound on the marginal polytope, which results in improvement of both parameter estimates and approximation to marginals. On synthetic data, we compare our algorithm on the two estimation tasks to the other existing methods. We analyze the method in the high-dimensional setting, where the number of dimensions $p$ is allowed to grow with the number of observations $n$. The rate of convergence of the estimate is demonstrated to depend explicitly on the sparsity of the underlying graph.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

0811.1239

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Hierarchical structure and the prediction of missing links in networks

Clauset, Aaron, Moore, Cristopher, Newman, M. E. J.

arXiv.org Machine LearningNov-4-2008

Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, where vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases these groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks, or genetic regulatory networks), or communities in social networks. Here we present a general technique for inferring hierarchical structure from network data and demonstrate that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients, and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partially known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.

artificial intelligence, dendrogram, télécommunications, (17 more...)

arXiv.org Machine Learning

doi: 10.1038/nature06830

0811.0484

Country:

North America > United States > New Mexico (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
(2 more...)

Add feedback

Gibbs posterior for variable selection in high-dimensional classification and data mining

Jiang, Wenxin, Tanner, Martin A.

arXiv.org Machine LearningOct-31-2008

In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables "$K$" can be much larger than the sample size "$n$." In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1214/07-AOS547

0810.5655

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Choice of neighbor order in nearest-neighbor classification

Hall, Peter, Park, Byeong U., Samworth, Richard J.

arXiv.org Machine LearningOct-29-2008

The $k$th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of $k$; and by the absence of techniques for empirical choice of $k$. In the present paper we detail the way in which the value of $k$ determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of $k$.

artificial intelligence, bayesian inference, classifier, (19 more...)

arXiv.org Machine Learning

doi: 10.1214/07-AOS537

0810.5276

Country:

Europe (1.00)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Statistical Learning Theory: Models, Concepts, and Results

von Luxburg, Ulrike, Schoelkopf, Bernhard

arXiv.org Machine LearningOct-27-2008

Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.

bayesian inference, classifier, survey article, (19 more...)

arXiv.org Machine Learning

0810.4752

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback