AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Loss-sensitive Training of Probabilistic Conditional Random Fields

Volkovs, Maksims N., Larochelle, Hugo, Zemel, Richard S.

arXiv.org Machine LearningJul-9-2011

We consider the problem of training probabilistic conditional random fields (CRFs) in the context of a task where performance is measured using a specific loss function. While maximum likelihood is the most common approach to training CRFs, it ignores the inherent structure of the task's loss function. We describe alternatives to maximum likelihood which take that loss into account. These include a novel adaptation of a loss upper bound from the structured SVMs literature to the CRF context, as well as a new loss-inspired KL divergence objective which relies on the probabilistic nature of CRFs. These loss-sensitive objectives are compared to maximum likelihood using ranking as a benchmark task. This comparison confirms the importance of incorporating loss information in the probabilistic training of CRFs, with the loss-inspired KL outperforming all other objectives.

artificial intelligence, machine learning, objective, (18 more...)

arXiv.org Machine Learning

1107.1805

Country:

North America > United States (0.93)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.77)

Add feedback

Predicting Solution Cost with Conditional Probabilities

Lelis, Levi (University of Alberta) | Stern, Roni (Ben Gurion University) | Arfaee, Shahab Jabbari (University of Alberta)

AAAI ConferencesJul-5-2011

Classical heuristic search algorithms find the solution cost of a problem while finding the path from the start state to a goal state. However, there are applications in which finding the path is not needed. In this paper we propose an algorithm that accurately and efficiently predicts the solution cost of a problem without finding the actual solution. We show empirically that our predictor makes more accurate predictions when compared to the bootstrapped heuristic, which is known to be a very accurate inadmissible heuristic. In addition, we show how our prediction algorithm can be used to enhance heuristic search algorithms. Namely, we use our predictor to calculate a bound for a bounded best-first search algorithm and to tune the w-value of Weighted IDA*. In both cases major search speedups were observed.

algorithm, prediction, solution cost, (17 more...)

AAAI Conferences

Fourth Annual Symposium on Combinatorial Search

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Proximal Methods for Hierarchical Sparse Coding

Jenatton, Rodolphe, Mairal, Julien, Obozinski, Guillaume, Bach, Francis

arXiv.org Machine LearningJul-5-2011

Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced tree-structured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the tree-structured sparse approximation problem at the same computational cost as traditional ones using the L1-norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

1009.2139

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Add feedback

A Variational Bayes Approach to Decoding in a Phase-Uncertain Digital Receiver

Das, Arijit, Quinn, Anthony

arXiv.org Machine LearningJul-4-2011

This paper presents a Bayesian approach to symbol and phase inference in a phase-unsynchronized digital receiver. It primarily extends [Quinn 2011] to the multi-symbol case, using the variational Bayes (VB) approximation to deal with the combinatorial complexity of the phase inference in this case. The work provides a fully Bayesian extension of the EM-based framework underlying current turbo-synchronization methods, since it induces a von Mises prior on the time-invariant phase parmeter. As a result, we achieve tractable iterative algorithms with improved robustness in low SNR regimes, compared to the current EM-based approaches. As a corollary to our analysis we also discover the importance of prior regularization in elegantly tackling the significant problem of phase ambiguity.

artificial intelligence, machine learning, posterior distribution, (15 more...)

arXiv.org Machine Learning

1107.0662

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Law of Connectivity in Machine Learning

Dundas, Jitesh

arXiv.org Artificial IntelligenceJul-1-2011

We present in this paper our law that there is always a connection present between two entities, with a selfconnection being present at least in each node. An entity is an object, physical or imaginary, that is connected by a path (or connection) and which is important for achieving the desired result of the scenario. In machine learning, we state that for any scenario, a subject entity is always, directly or indirectly, connected and affected by single or multiple independent / dependent entities, and their impact on the subject entity is dependent on various factors falling into the categories such as the existenc

artificial intelligence, interaction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1107.0194

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Sequential Diagnosis by Abstraction

Siddiqi, S. A., Huang, J.

Journal of Artificial Intelligence ResearchJun-30-2011

When a system behaves abnormally, sequential diagnosis takes a sequence of measurements of the system until the faults causing the abnormality are identified, and the goal is to reduce the diagnostic cost, defined here as the number of measurements. To propose measurement points, previous work employs a heuristic based on reducing the entropy over a computed set of diagnoses. This approach generally has good performance in terms of diagnostic cost, but can fail to diagnose large systems when the set of diagnoses is too large. Focusing on a smaller set of probable diagnoses scales the approach but generally leads to increased average diagnostic costs. In this paper, we propose a new diagnostic framework employing four new techniques, which scales to much larger systems with good performance in terms of diagnostic cost. First, we propose a new heuristic for measurement point selection that can be computed efficiently, without requiring the set of diagnoses, once the system is modeled as a Bayesian network and compiled into a logical form known as d-DNNF. Second, we extend hierarchical diagnosis, a technique based on system abstraction from our previous work, to handle probabilities so that it can be applied to sequential diagnosis to allow larger systems to be diagnosed. Third, for the largest systems where even hierarchical diagnosis fails, we propose a novel method that converts the system into one that has a smaller abstraction and whose diagnoses form a superset of those of the original system; the new system can then be diagnosed and the result mapped back to the original system. Finally, we propose a novel cost estimation function which can be used to choose an abstraction of the system that is more likely to provide optimal average cost. Experiments with ISCAS-85 benchmark circuits indicate that our approach scales to all circuits in the suite except one that has a flat structure not susceptible to useful abstraction.

abstraction, cone, diagnosis, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3296

AI Access Foundation

10713

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

A Comprehensive Trainable Error Model for Sung Music Queries

Birmingham, W. P., Meek, C. J.

arXiv.org Artificial IntelligenceJun-30-2011

We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1334

1107.0054

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On Prediction Using Variable Order Markov Models

Begleiter, R., El-Yaniv, R., Yona, G.

arXiv.org Artificial IntelligenceJun-30-2011

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1491

1107.0051

Country:

North America > United States (0.92)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Media > Music (0.92)
Leisure & Entertainment (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis

Goldman, C. V., Zilberstein, S.

arXiv.org Artificial IntelligenceJun-30-2011

The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1427

1107.0047

Country:

North America > United States > California (0.46)
North America > United States > Massachusetts (0.28)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.64)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms

Derbeko, P., El-Yaniv, R., Meir, R.

arXiv.org Artificial IntelligenceJun-30-2011

Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1417

1107.0046

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback