Goto

Collaborating Authors

 Undirected Networks


Markov Chain Monte Carlo for Bayesian Inference - The Metropolis Algorithm - QuantStart

#artificialintelligence

In previous discussions of Bayesian Inference we introduced Bayesian Statistics and considered how to infer a binomial proportion using the concept of conjugate priors. We discussed the fact that not all models can make use of conjugate priors and thus calculation of the posterior distribution would need to be approximated numerically. In this article we introduce the main family of algorithms, known collectively as Markov Chain Monte Carlo (MCMC), that allow us to approximate the posterior distribution as calculated by Bayes' Theorem. In particular, we consider the Metropolis Algorithm, which is easily stated and relatively straightforward to understand. It serves as a useful starting point when learning about MCMC before delving into more sophisticated algorithms such as Metropolis-Hastings, Gibbs Samplers and Hamiltonian Monte Carlo. Once we have described how MCMC works, we will carry it out using the open-source PyMC3 library, which takes care of many of the underlying implementation details, allowing us to concentrate on Bayesian modelling.


Generating Text Using a Markov Model

@machinelearnbot

The generate method takes in a conditional frequency distribution. Think โ€“ how many times did each word appear after'farm'? That is what a conditional frequency distribution outputs (for all words, not just'farm'). The rest of the generate function does is output text based on the distribution observed in the training data. I did this by making an array with each word that appeared after the current word.


Forecasting with the Baum-Welch Algorithm and Hidden Markov Models

@machinelearnbot

Leonard Baum and Lloyd Welch designed a probabilistic modelling algorithm to detect patterns in Hidden Markov Processes. They built upon the theory of probabilistic functions of a Markov Chain and the Expectationโ€“Maximization (EM) Algorithm - an iterative method for finding maximum likelihood or maximum a-posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables. The Baumโ€“Welch Algorithm initially proved to be a remarkable code-breaking and speech recognition tool but also has applications for business, finance, sciences and others. The algorithm finds unknown parameters of a Hidden Markov Model: the maximum likelihood estimate of the parameters of a Hidden Markov Model given a set of observed feature vectors. Two step process: 1. computing a-posteriori probabilities for a given model; and 2. re-estimation of the model parameters.


Statistical Relational Artificial Intelligence: Logic, Probability, and Computation

Morgan & Claypool Publishers

An intelligent agent interacting with the real world will encounter individual people, courses, test results, drugs prescriptions, chairs, boxes, etc., and needs to reason about properties of these individuals and relations among them as well as cope with uncertainty. Uncertainty has been studied in probability theory and graphical models, and relations have been studied in logic, in particular in the predicate calculus and its extensions. This book examines the foundations of combining logic and probability into what are called relational probabilistic models. It introduces representations, inference, and learning techniques for probability, logic, and their combinations. The book focuses on two representations in detail: Markov logic networks, a relational extension of undirected graphical models and weighted first-order predicate calculus formula, and Problog, a probabilistic extension of logic programs that can also be viewed as a Turing-complete relational extension of Bayesian networks.


Resources for Speech Recognition โ€ข /r/MachineLearning

@machinelearnbot

Mohri is most famously known for his work with finite state transducers(FST). So as you can see his very second lecture is on Finite State Automata(FSA). FSTs and FSAs are very powerful formalisms which using the principle of compositionality can be applied to all parts of the speech recognition pipeline - acoustic modelling, context modelling, lexical modelling, and language modelling. If you like getting your hands dirty, Kaldi is a good first place to start:http://kaldi-asr.org/. And the easiest place to start hacking to see what is going on under the hood is the speech decoder.


Tool for Computing Continuous Distributed Representations of Words

@machinelearnbot

Natural language processing (NLP) involves machine learning, artificial intelligence, algorithms and linguistics related to interactions between computers and human languages. One important goal of NLP is to design and build software that will understand and analyze human languages to simplify and optimize human - computer communication. NLP algorithms are usually based on probability theory and machine learning grounded in statistical inference -- to automatically learn rules through analysis of real-world usage. It includes word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, question answering and requires both syntactic and semantic analysis at various levels. NLP applications today involve spelling and grammar correction in word processors, machine translation, sentiment analysis and email spam detection.


Markov Chains explained visually

#artificialintelligence

Markov chains, named after Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another. For example, if you made a Markov chain model of a baby's behavior, you might include "playing," "eating", "sleeping," and "crying" as states, which together with other behaviors could form a'state space': a list of all possible states. In addition, on top of the state space, a Markov chain tells you the probabilitiy of hopping, or "transitioning," from one state to any other state---e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first. With two states (A and B) in our state space, there are 4 possible transitions (not 2, because a state can transition back into itself). In this two state diagram, the probability of transitioning from any state to any other state is 0.5.


New Optimisation Methods for Machine Learning

arXiv.org Machine Learning

A thesis submitted for the degree of Doctor of Philosophy of The Australian National University. In this work we introduce several new optimisation methods for problems in machine learning. Our algorithms broadly fall into two categories: optimisation of finite sums and of graph structured objectives. The finite sum problem is simply the minimisation of objective functions that are naturally expressed as a summation over a large number of terms, where each term has a similar or identical weight. Such objectives most often appear in machine learning in the empirical risk minimisation framework in the non-online learning setting. The second category, that of graph structured objectives, consists of objectives that result from applying maximum likelihood to Markov random field models. Unlike the finite sum case, all the non-linearity is contained within a partition function term, which does not readily decompose into a summation. For the finite sum problem, we introduce the Finito and SAGA algorithms, as well as variants of each. For graph-structured problems, we take three complementary approaches. We look at learning the parameters for a fixed structure, learning the structure independently, and learning both simultaneously. Specifically, for the combined approach, we introduce a new method for encouraging graph structures with the "scale-free" property. For the structure learning problem, we establish SHORTCUT, a O(n^{2.5}) expected time approximate structure learning method for Gaussian graphical models. For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.


Mean-Field Inference in Gaussian Restricted Boltzmann Machine

arXiv.org Machine Learning

A Gaussian restricted Boltzmann machine (GRBM) is a Boltzmann machine defined on a bipartite graph and is an extension of usual restricted Boltzmann machines. A GRBM consists of two different layers: a visible layer composed of continuous visible variables and a hidden layer composed of discrete hidden variables. In this paper, we derive two different inference algorithms for GRBMs based on the naive mean-field approximation (NMFA). One is an inference algorithm for whole variables in a GRBM, and the other is an inference algorithm for partial variables in a GBRBM. We compare the two methods analytically and numerically and show that the latter method is better.


Towards An Architecture for Representation, Reasoning and Learning in Human-Robot Collaboration

AAAI Conferences

Robots collaborating with humans need to represent knowledge, reason, and learn, at the sensorimotor level and the cognitive level. This paper summarizes the capabilities of an architecture that combines the comple- mentary strengths of declarative programming, proba- bilistic graphical models, and reinforcement learning, to represent, reason with, and learn from, qualitative and quantitative descriptions of incomplete domain knowledge and uncertainty. Representation and reasoning is based on two tightly-coupled domain representations at different resolutions. For any given task, the coarse- resolution symbolic domain representation is translated to an Answer Set Prolog program, which is solved to provide a tentative plan of abstract actions, and to explain unexpected outcomes. Each abstract action is implemented by translating the relevant subset of the corresponding fine-resolution probabilistic representation to a partially observable Markov decision process (POMDP). Any high probability beliefs, obtained by the execution of actions based on the POMDP policy, update the coarse-resolution representation. When incomplete knowledge of the rules governing the domain dynamics results in plan execution not achieving the desired goal, the coarse-resolution and fine-resolution representations are used to formulate the task of incrementally and interactively discovering these rules as a reinforcement learning problem. These capabilities are illustrated in the context of a mobile robot deployed in an indoor office domain.