Goto

Collaborating Authors

 Learning Graphical Models


Generalized system identification with stable spline kernels

arXiv.org Machine Learning

Regularized least-squares approaches have been successfully applied to linear system identification. Recent approaches use quadratic penalty terms on the unknown impulse response defined by stable spline kernels, which control model space complexity by leveraging regularity and bounded-input bounded-output stability. This paper extends linear system identification to a wide class of nonsmooth stable spline estimators, where regularization functionals and data misfits can be selected from a rich set of piecewise linear quadratic penalties. This class encompasses the 1-norm, huber, and vapnik, in addition to the least-squares penalty, and the approach allows linear inequality constraints on the unknown impulse response. We develop a customized interior point solver for the entire class of proposed formulations. By representing penalties through their conjugates, we allow a simple interface that enables the user to specify any piecewise linear quadratic penalty for misfit and regularizer, together with inequality constraints on the response. The solver is locally quadratically convergent, with O(n2(m+n)) arithmetic operations per iteration, for n impulse response coefficients and m output measurements. In the system identification context, where n << m, IPsolve is competitive with available alternatives, illustrated by a comparison with TFOCS and libSVM. The modeling framework is illustrated with a range of numerical experiments, featuring robust formulations for contaminated data, relaxation systems, and nonnegativity and unimodality constraints on the impulse response. Incorporating constraints yields significant improvements in system identification. The solver used to obtain the results is distributed via an open source code repository.


Is deep learning a Markov chain in disguise?

@machinelearnbot

Andrej Karpathy's post "The Unreasonable Effectiveness of Recurrent Neural Networks" made splashes last year. The basic premise is that you can create a recurrent neural network to learn language features character-by-character. But is the resultant model any different from a Markov chain built for the same purpose? I implemented a character-by-character Markov chain in R to find out. First, let's play a variation of the Imitation Game with generated text from Karpathy's tinyshakespeare dataset.


Markov Chain Monte Carlo for Bayesian Inference - The Metropolis Algorithm - QuantStart

#artificialintelligence

In previous discussions of Bayesian Inference we introduced Bayesian Statistics and considered how to infer a binomial proportion using the concept of conjugate priors. We discussed the fact that not all models can make use of conjugate priors and thus calculation of the posterior distribution would need to be approximated numerically. In this article we introduce the main family of algorithms, known collectively as Markov Chain Monte Carlo (MCMC), that allow us to approximate the posterior distribution as calculated by Bayes' Theorem. In particular, we consider the Metropolis Algorithm, which is easily stated and relatively straightforward to understand. It serves as a useful starting point when learning about MCMC before delving into more sophisticated algorithms such as Metropolis-Hastings, Gibbs Samplers and Hamiltonian Monte Carlo. Once we have described how MCMC works, we will carry it out using the open-source PyMC3 library, which takes care of many of the underlying implementation details, allowing us to concentrate on Bayesian modelling.


Generating Text Using a Markov Model

@machinelearnbot

The generate method takes in a conditional frequency distribution. Think – how many times did each word appear after'farm'? That is what a conditional frequency distribution outputs (for all words, not just'farm'). The rest of the generate function does is output text based on the distribution observed in the training data. I did this by making an array with each word that appeared after the current word.


k-nearest neighbor algorithm using Python

#artificialintelligence

This article was written by Natasha Latysheva. Here we publish a short version, with references to full source code in the original article. In machine learning, you may often wish to build predictors that allows to classify things into categories based on some set of associated values. For example, it is possible to provide a diagnosis to a patient based on data from previous patients. Many algorithms have been developed for automated classification, and common ones include random forests, support vector machines, Naïve Bayes classifiers, and many types of neural networks.


The best kept secret about linear and logistic regression

@machinelearnbot

All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm). It is indeed incredibly simple, robust, easy to interpret, and easy to code (no statistical libraries required).


Bayesian Networks & BayesiaLab: A Practical Introduction for Researchers

#artificialintelligence

This practical introduction is geared towards scientists who wish to employ Bayesian networks for applied research using the BayesiaLab software platform. Through numerous examples, this book illustrates how implementing Bayesian networks involves concepts from many disciplines, including computer science, probability theory, information theory, machine learning, and statistics. Each chapter explores a real-world problem domain, exploring aspects of Bayesian networks and simultaneously introducing functions of BayesiaLab. The book can serve as a self-study guide for learners and as a reference manual for advanced practitioners.


Forecasting with the Baum-Welch Algorithm and Hidden Markov Models

@machinelearnbot

Leonard Baum and Lloyd Welch designed a probabilistic modelling algorithm to detect patterns in Hidden Markov Processes. They built upon the theory of probabilistic functions of a Markov Chain and the Expectation–Maximization (EM) Algorithm - an iterative method for finding maximum likelihood or maximum a-posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables. The Baum–Welch Algorithm initially proved to be a remarkable code-breaking and speech recognition tool but also has applications for business, finance, sciences and others. The algorithm finds unknown parameters of a Hidden Markov Model: the maximum likelihood estimate of the parameters of a Hidden Markov Model given a set of observed feature vectors. Two step process: 1. computing a-posteriori probabilities for a given model; and 2. re-estimation of the model parameters.


Statistical Relational Artificial Intelligence: Logic, Probability, and Computation

Morgan & Claypool Publishers

An intelligent agent interacting with the real world will encounter individual people, courses, test results, drugs prescriptions, chairs, boxes, etc., and needs to reason about properties of these individuals and relations among them as well as cope with uncertainty. Uncertainty has been studied in probability theory and graphical models, and relations have been studied in logic, in particular in the predicate calculus and its extensions. This book examines the foundations of combining logic and probability into what are called relational probabilistic models. It introduces representations, inference, and learning techniques for probability, logic, and their combinations. The book focuses on two representations in detail: Markov logic networks, a relational extension of undirected graphical models and weighted first-order predicate calculus formula, and Problog, a probabilistic extension of logic programs that can also be viewed as a Turing-complete relational extension of Bayesian networks.


Resources for Speech Recognition • /r/MachineLearning

@machinelearnbot

Mohri is most famously known for his work with finite state transducers(FST). So as you can see his very second lecture is on Finite State Automata(FSA). FSTs and FSAs are very powerful formalisms which using the principle of compositionality can be applied to all parts of the speech recognition pipeline - acoustic modelling, context modelling, lexical modelling, and language modelling. If you like getting your hands dirty, Kaldi is a good first place to start:http://kaldi-asr.org/. And the easiest place to start hacking to see what is going on under the hood is the speech decoder.