AITopics

In addition, the Dirichlet prior is known as a distribution that ensures likelihood equivalence; this score is known as \Bayesian Dirichlet equivalence (BDe)" (Heckerman et al., 1995). Given no prior knowledge, the Bayesian Dirichlet equivalence uniform (BDeu), as proposed earlier by Buntine (1991), is often used. Actually, BDe(u) requires an \equivalent sample size (ESS)", which re ects the degree of a user's prior belief. Moreover, recent studies have demonstrated that ESS plays an important role in the resulting network structure estimate. Steck and Jaakkola (2002) demonstrated that the deletion of an arc in a Bayesian network is more likely to occur as ESS goes asymptotically to zero for a large sample.

artificial intelligence, bdeu, machine learning, (18 more...)

1202.3766

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Tur, Inma, Castelo, Robert

Learning mixed graphical models from data with p larger than n

Structure learning of Gaussian graphical models is an extensively studied problem in the classical multivariate setting where the sample size n is larger than the number of random variables p, as well as in the more challenging setting when p>>n. However, analogous approaches for learning the structure of graphical models with mixed discrete and continuous variables when p>>n remain largely unexplored. Here we describe a statistical learning procedure for this problem based on limited-order correlations and assess its performance with synthetic and real data.

artificial intelligence, machine learning, mixed gmm, (16 more...)

1202.3765

Country: Europe (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Shpitser, Ilya, Richardson, Thomas S., Robins, James M.

An Efficient Algorithm for Computing Interventional Distributions in Latent Variable Causal Models

Probabilistic inference in graphical models is the task of computing marginal and conditional densities of interest from a factorized representation of a joint probability distribution. Inference algorithms such as variable elimination and belief propagation take advantage of constraints embedded in this factorization to compute such densities efficiently. In this paper, we propose an algorithm which computes interventional distributions in latent variable causal models represented by acyclic directed mixed graphs (ADMGs). To compute these distributions efficiently, we take advantage of a recursive factorization which generalizes the usual Markov factorization for DAGs and the more recent factorization for ADMGs. Our algorithm can be viewed as a generalization of variable elimination to the mixed graph case. We show our algorithm is exponential in the mixed graph generalization of treewidth.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1202.3763

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Reyhani, Nima, Hino, Hideitsu, Vigario, Ricardo

New Probabilistic Bounds on Eigenvalues and Eigenvectors of Random Kernel Matrices

Kernel methods are successful approaches for different machine learning problems. This success is mainly rooted in using feature maps and kernel matrices. Some methods rely on the eigenvalues/eigenvectors of the kernel matrix, while for other methods the spectral information can be used to estimate the excess risk. An important question remains on how close the sample eigenvalues/eigenvectors are to the population values. In this paper, we improve earlier results on concentration bounds for eigenvalues of general kernel matrices. Meanwhile, the obstacles for sharper bounds are accounted for and partially addressed. As a case study, we derive a concentration inequality for sample kernel target-alignment. 1 INTRODUCTION Kernel methods such as Spectral Clustering, Kernel Principal Component Analysis(KPCA), and Support Vector Machines, are successful approaches in many practical machine learning and data analysis problems (Steinwart & Christmann, 2008). The main ingredient of these methods is the kernel matrix, which is built using the kernel function, evaluated at given sample points.

artificial intelligence, eigenvalue, machine learning, (17 more...)

1202.3761

Country:

Europe (0.28)
North America (0.28)
Asia > Japan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Rao, Vinayak, Teh, Yee Whye

Fast MCMC sampling for Markov jump processes and continuous time Bayesian networks

Markov jump processes and continuous time Bayesian networks are important classes of continuous time dynamical systems. In this paper, we tackle the problem of inferring unobserved paths in these models by introducing a fast auxiliary variable Gibbs sampler. Our approach is based on the idea of uniformization, and sets up a Markov chain over paths by sampling a finite set of virtual jump times and then running a standard hidden Markov model forward filtering-backward sampling algorithm over states at the set of extant and virtual jump times. We demonstrate significant computational benefits over a state-of-the-art Gibbs sampler on a number of continuous time Bayesian networks.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1202.376

Country: North America (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Poczos, Barnabas, Xiong, Liang, Schneider, Jeff

Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.

artificial intelligence, data mining, machine learning, (19 more...)

1202.3758

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Peters, Jonas, Mooij, Joris, Janzing, Dominik, Schoelkopf, Bernhard

Identifiability of Causal Graphs using Functional Models

This work addresses the following question: Under what assumptions on the data generating process can one infer the causal graph from the joint distribution? The approach taken by conditional independence-based causal discovery methods is based on two assumptions: the Markov condition and faithfulness. It has been shown that under these assumptions the causal graph can be identified up to Markov equivalence (some arrows remain undirected) using methods like the PC algorithm. In this work we propose an alternative by defining Identifiable Functional Model Classes (IFMOCs). As our main theorem we prove that if the data generating process belongs to an IFMOC, one can identify the complete causal graph. To the best of our knowledge this is the first identifiability result of this kind that is not limited to linear functional relationships. We discuss how the IFMOC assumption and the Markov and faithfulness assumptions relate to each other and explain why we believe that the IFMOC assumption can be tested more easily on given data. We further provide a practical algorithm that recovers the causal graph from finitely many data; experiments on simulated data support the theoretical findings.

artificial intelligence, graph, machine learning, (17 more...)

1202.3757

Country:

Europe > Germany (0.28)
North America > Canada (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.63)

Niinimaki, Teppo, Parviainen, Pekka, Koivisto, Mikko

Partial Order MCMC for Structure Discovery in Bayesian Networks

We present a new Markov chain Monte Carlo method for estimating posterior probabilities of structural features in Bayesian networks. The method draws samples from the posterior distribution of partial orders on the nodes; for each sampled partial order, the conditional probabilities of interest are computed exactly. We give both analytical and empirical results that suggest the superiority of the new method compared to previous methods, which sample either directed acyclic graphs or linear orders on the nodes.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1202.3753

Country: Europe > Finland (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Jojic, Nebojsa, Perina, Alessandro

Multidimensional counting grids: Inferring word order from disordered bags of words

Models of bags of words typically assume topic mixing so that the words in a single bag come from a limited number of topics. We show here that many sets of bag of words exhibit a very different pattern of variation than the patterns that are efficiently captured by topic mixing. In many cases, from one bag of words to the next, the words disappear and new ones appear as if the theme slowly and smoothly shifted across documents (providing that the documents are somehow ordered). Examples of latent structure that describe such ordering are easily imagined. For example, the advancement of the date of the news stories is reflected in a smooth change over the theme of the day as certain evolving news stories fall out of favor and new events create new stories. Overlaps among the stories of consecutive days can be modeled by using windows over linearly arranged tight distributions over words. We show here that such strategy can be extended to multiple dimensions and cases where the ordering of data is not readily obvious. We demonstrate that this way of modeling covariation in word occurrences outperforms standard topic models in classification and prediction tasks in applications in biology, text modeling and computer vision.

artificial intelligence, machine learning, natural language, (20 more...)

1202.3752

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

B, Ananda Narayanan, Ravindran, Balaraman

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments that represent this explore/exploit situation. We propose a learning algorithm for bandit problems based on fractional expectation of rewards acquired. The algorithm is theoretically shown to converge on an eta-optimal arm and achieve O(n) sample complexity. Experimental results show the algorithm incurs substantially lower regrets than parameter-optimized eta-greedy and SoftMax approaches and other low sample complexity state-of-the-art techniques.

artificial intelligence, data mining, machine learning, (19 more...)

1202.375

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)