AITopics | arXiv.org Machine Learning

Plotting

arXiv.org Machine Learning

On Universal Prediction and Bayesian Confirmation

arXiv.org Machine LearningSep-10-2007

The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff's model possesses many desirable properties: Strong total and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.

artificial intelligence, bayesian inference, prediction, (19 more...)

arXiv.org Machine Learning

0709.1516

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Online Learning in Discrete Hidden Markov Models

Alamino, Roberto C., Caticha, Nestor

arXiv.org Machine LearningAug-17-2007

We present and analyse three online algorithms for learning in discrete Hidden Markov Models (HMMs) and compare them with the Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalisation error we draw learning curves in simplified situations. The performance for learning drifting concepts of one of the presented algorithms is analysed and compared with the Baldi-Chauvin algorithm in the same situations. A brief discussion about learning and symmetry breaking based on our results is also presented.

algorithm, computer based training, educational technology, (21 more...)

arXiv.org Machine Learning

doi: 10.1063/1.2423274

0708.2377

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Piecewise linear regularized solution paths

Rosset, Saharon, Zhu, Ji

arXiv.org Machine LearningAug-16-2007

We consider the generic regularized optimization problem $\hat{\mathsf{\beta}}(\lambda)=\arg \min_{\beta}L({\sf{y}},X{\sf{\beta}})+\lambda J({\sf{\beta}})$. Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for the LASSO--that is, if $L$ is squared error loss and $J(\beta)=\|\beta\|_1$ is the $\ell_1$ norm of $\beta$--the optimal coefficient path is piecewise linear, that is, $\partial \hat{\beta}(\lambda)/\partial \lambda$ is piecewise constant. We derive a general characterization of the properties of (loss $L$, penalty $J$) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1214/009053606000001370

0708.2197

Country: North America > United States > Michigan (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback

Fast rates for support vector machines using Gaussian kernels

Steinwart, Ingo, Scovel, Clint

arXiv.org Machine LearningAug-14-2007

For binary classification we establish learning rates up to the order of $n^{-1}$ for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov's noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike previously proposed concepts for bounding the approximation error, the geometric noise assumption does not employ any smoothness assumption.

artificial intelligence, assumption, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1214/009053606000001226

0708.1838

Country: North America > United States > New Mexico (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Efficient independent component analysis

Chen, Aiyou, Bickel, Peter J.

arXiv.org Machine LearningAug-2-2007

Independent component analysis (ICA) has been widely used for blind source separation in many fields such as brain imaging analysis, signal processing and telecommunication. Many statistical techniques based on M-estimates have been proposed for estimating the mixing matrix. Recently, several nonparametric methods have been developed, but in-depth analysis of asymptotic efficiency has not been available. We analyze ICA using semiparametric theories and propose a straightforward estimate based on the efficient score function by using B-spline approximations. The estimate is asymptotically efficient under moderate conditions and exhibits better performance than standard ICA methods in a variety of simulations.

health & medicine, neurology, score function, (21 more...)

arXiv.org Machine Learning

doi: 10.1214/009053606000000939

0705.4230

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Learning from dependent observations

Steinwart, Ingo, Hush, Don, Scovel, Clint

arXiv.org Machine LearningJul-2-2007

In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of SVMs for $\a$-mixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise.

artificial intelligence, machine learning, probability space, (17 more...)

arXiv.org Machine Learning

0707.0303

Country: North America > United States > New Mexico (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Metric Embedding for Nearest Neighbor Classification

Sriperumbudur, Bharath K., Lanckriet, Gert R. G.

arXiv.org Machine LearningJun-24-2007

The distance metric plays an important role in nearest neighbor (NN) classification. Usually the Euclidean distance metric is assumed or a Mahalanobis distance metric is optimized to improve the NN performance. In this paper, we study the problem of embedding arbitrary metric spaces into a Euclidean space with the goal to improve the accuracy of the NN classifier. We propose a solution by appealing to the framework of regularization in a reproducing kernel Hilbert space and prove a representer-like theorem for NN classification. The embedding function is then determined by solving a semidefinite program which has an interesting connection to the soft-margin linear binary support vector machine classifier. Although the main focus of this paper is to present a general, theoretical framework for metric embedding in a NN setting, we demonstrate the performance of the proposed method on some benchmark datasets and show that it performs better than the Mahalanobis metric learning algorithm in terms of leave-one-out and generalization errors.

artificial intelligence, distance metric, machine learning, (17 more...)

arXiv.org Machine Learning

0706.3499

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

A tutorial on conformal prediction

Shafer, Glenn, Vovk, Vladimir

arXiv.org Machine LearningJun-21-2007

Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability $\epsilon$, together with a method that makes a prediction $\hat{y}$ of a label $y$, it produces a set of labels, typically containing $\hat{y}$, that also contains $y$ with probability $1-\epsilon$. Conformal prediction can be applied to any method for producing $\hat{y}$: a nearest-neighbor method, a support-vector machine, ridge regression, etc. Conformal prediction is designed for an on-line setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right $1-\epsilon$ of the time, even though they are based on an accumulating dataset rather than on independent datasets. In addition to the model under which successive examples are sampled independently, other on-line compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. This tutorial presents a self-contained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in "Algorithmic Learning in a Random World", by Vladimir Vovk, Alex Gammerman, and Glenn Shafer (Springer, 2005).

artificial intelligence, bayesian inference, prediction, (18 more...)

arXiv.org Machine Learning

0706.3188

Country:

North America > United States > New Jersey (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Leisure & Entertainment (0.45)
Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
(2 more...)

Add feedback

Mixed membership stochastic blockmodels

Airoldi, Edoardo M, Blei, David M, Fienberg, Stephen E, Xing, Eric P

arXiv.org Machine LearningMay-30-2007

Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

bayesian inference, membership stochastic blockmodel, us government, (18 more...)

arXiv.org Machine Learning

0705.4485

Country:

North America > United States > North Carolina (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Undercomplete Blind Subspace Deconvolution

Szabo, Zoltan, Poczos, Barnabas, Lorincz, Andras

arXiv.org Machine LearningMay-20-2007

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated `high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.

artificial intelligence, isa task, survey article, (18 more...)

arXiv.org Machine Learning

math/0701210

Country:

Europe (1.00)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback