AITopics

doi: 10.1613/jair.3427

AI Access Foundation

10736

Country: Europe > Sweden > Östergötland County > Linköping (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Simon, Noah, Tibshirani, Rob

Discriminant Analysis with Adaptively Pooled Covariance

arXiv.org Machine LearningDec-6-2011

Linear and Quadratic Discriminant analysis (LDA/QDA) are common tools for classification problems. For these methods we assume observations are normally distributed within group. We estimate a mean and covariance matrix for each group and classify using Bayes theorem. With LDA, we estimate a single, pooled covariance matrix, while for QDA we estimate a separate covariance matrix for each group. Rarely do we believe in a homogeneous covariance structure between groups, but often there is insufficient data to separately estimate covariance matrices. We propose 1-PDA, a regularized model which adaptively pools elements of the precision matrices. Adaptively pooling these matrices decreases the variance of our estimates (as in LDA), without overly biasing them. In this paper, we propose and discuss this method, give an efficient algorithm to fit it for moderate sized problems, and show its efficacy on real and simulated datasets. Keywords: Lasso, Penalized, Discriminant Analysis, Interactions, Classification 1. Introduction Consider the usual two class problem: our data consists ofn observations, each observation with a known class label { 1, 2}, and p covariates measured per observation.

artificial intelligence, bayesian inference, machine learning, (13 more...)

1111.1687

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningDec-3-2011

Dimension adaptability of Gaussian process models with variable selection and projection

Tokdar, Surya T.

It is now known that an extended Gaussian process model equipped with rescaling can adapt to different smoothness levels of a function valued parameter in many nonparametric Bayesian analyses, offering a posterior convergence rate that is optimal (up to logarithmic factors) for the smoothness class the true function belongs to. This optimal rate also depends on the dimension of the function's domain and one could potentially obtain a faster rate of convergence by casting the analysis in a lower dimensional subspace that does not amount to any loss of information about the true function. In general such a subspace is not known a priori but can be explored by equipping the model with variable selection or linear projection. We demonstrate that for nonparametric regression, classification, density estimation and density regression, a rescaled Gaussian process model equipped with variable selection or linear projection offers a posterior convergence rate that is optimal (up to logarithmic factors) for the lowest dimension in which the analysis could be cast without any loss of information about the true function. Theoretical exploration of such dimension reduction features appears novel for Bayesian nonparametric models with or without Gaussian processes.

artificial intelligence, machine learning, regression, (15 more...)

1112.0716

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Sugiyama, Masashi, Yamada, Makoto, Kimura, Manabu, Hachiya, Hirotaka

Information-Maximization Clustering based on Squared-Loss Mutual Information

arXiv.org Machine LearningDec-2-2011

Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve non-convex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative information-maximization clustering method based on a squared-loss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1112.0611

Country:

Asia (0.93)
North America > United States > California (0.46)
Europe > United Kingdom > England (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

arXiv.org Machine LearningNov-29-2011

Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey

Zhou, Yang

Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: - Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. - Graphical models provide additional insights into the properties of the model, including the conditional independence properties. - Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.

artificial intelligence, bayesian inference, machine learning, (15 more...)

1111.6925

Country: North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview (0.90)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

arXiv.org Artificial IntelligenceNov-29-2011

Bayesian Causal Induction

Ortega, Pedro A.

Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1111.0708

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Machine LearningNov-26-2011

Optimal exponential bounds on the accuracy of classification

Pentacaput, N. I.

We consider a standard binary classification problem. The performance of any binary classifier based on the training data is characterized by the excess risk. We study Bahadur's type exponential bounds on the minimax accuracy confidence function based on the excess risk. We study how this quantity depends on the complexity of the class of distributions characterized by exponents of entropies of the class of regression functions or of the class of Bayes classifiers corresponding to the distributions from the class. We also study its dependence on margin parameters of the classification problem.

artificial intelligence, machine learning, margin condition, (17 more...)

1111.616

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Journal of Artificial Intelligence ResearchNov-25-2011

Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

Golovin, D., Krause, A.

Many problems in artificial intelligence require adaptively making a sequence of decisions with uncertain outcomes under partial observability. Solving such stochastic optimization problems is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse AI applications including management of sensing resources, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.

algorithm, avg, realization, (15 more...)

doi: 10.1613/jair.3278

AI Access Foundation

10731

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany (0.04)
(10 more...)

Genre: Overview (0.45)

Industry:

Information Technology (0.46)
Transportation (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
(2 more...)

Wahlberg, Bo, Rojas, Cristian R., Annergren, Mariette

On l_1 Mean and Variance Filtering

arXiv.org Machine LearningNov-25-2011

This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to estimate the change times and the mean values within the segments. The second case is when the mean value is constant, but the variance can change. The assumption is that the variance is piecewise constant in time, and we want to estimate change times and the variance values within the segments. To find solutions to these problems, we will study an l_1 regularized maximum likelihood method, related to the fused lasso method and l_1 trend filtering, where the parameters to be estimated are free to vary at each sample. To penalize variations in the estimated parameters, the l_1-norm of the time difference of the parameters is used as a regularization term. This idea is closely related to total variation denoising. The main contribution is that a convex formulation of this variance estimation problem, where the parametrization is based on the inverse of the variance, can be formulated as a certain l_1 mean estimation problem. This implies that results and methods for mean estimation can be applied to the challenging problem of variance segmentation/estimation.

artificial intelligence, machine learning, variance, (18 more...)

1111.5948

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.33)

Hamze, Firas, Wang, Ziyu, de Freitas, Nando

Self-Avoiding Random Dynamics on Integer Complex Systems

arXiv.org Machine LearningNov-25-2011

This paper introduces a new specialized algorithm for equilibrium Monte Carlo sampling of binary-valued systems, which allows for large moves in the state space. This is achieved by constructing self-avoiding walks (SAWs) in the state space. As a consequence, many bits are flipped in a single MCMC step. We name the algorithm SARDONICS, an acronym for Self-Avoiding Random Dynamics on Integer Complex Systems. The algorithm has several free parameters, but we show that Bayesian optimization can be used to automatically tune them. SARDONICS performs remarkably well in a broad number of sampling tasks: toroidal ferromagnetic and frustrated Ising models, 3D Ising models, restricted Boltzmann machines and chimera graphs arising in the design of quantum computers.

artificial intelligence, bayesian inference, machine learning, (21 more...)

1111.5379

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(3 more...)