Bayesian Learning
5 things you need to know about A.I.: Cognitive, neural and deep, oh my!
There's never any shortage of buzzwords in the IT world, but when it comes to A.I., they can be hard to tell apart. Here are five things you need to understand. Artificial intelligence refers to "a broad set of methods, algorithms and technologies that make software'smart' in a way that may seem human-like to an outside observer," said Lynne Parker, director of the division of Information and Intelligent Systems for the National Science Foundation. Machine learning, computer vision, natural language processing, robotics and related topics are all part of A.I., in other words. "Some people may come up with distinctions between the two, but there is not a universal view that the two terms mean anything different," Parker said.
Kernel Approximation Methods for Speech Recognition
May, Avner, Garakani, Alireza Bagheri, Lu, Zhiyun, Guo, Dong, Liu, Kuan, Bellet, Aurรฉlien, Fan, Linxi, Collins, Michael, Hsu, Daniel, Kingsbury, Brian, Picheny, Michael, Sha, Fei
We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. (2013) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.
Inferring Cognitive Models from Data using Approximate Bayesian Computation
Kangasrรครคsiรถ, Antti, Athukorala, Kumaripaba, Howes, Andrew, Corander, Jukka, Kaski, Samuel, Oulasvirta, Antti
An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data. This is a difficult problem, because of the substantial complexity and variety in human behavioral strategies. We report an investigation into a new approach using approximate Bayesian computation (ABC) to condition model parameters to data and prior knowledge. As the case study we examine menu interaction, where we have click time data only to infer a cognitive model that implements a search behaviour with parameters such as fixation duration and recall probability. Our results demonstrate that ABC (i) improves estimates of model parameter values, (ii) enables meaningful comparisons between model variants, and (iii) supports fitting models to individual users. ABC provides ample opportunities for theoretical HCI research by allowing principled inference of model parameter values and their uncertainty.
Approximation and inference methods for stochastic biochemical kinetics - a tutorial review
Schnoerr, David, Sanguinetti, Guido, Grima, Ramon
Stochastic fluctuations of molecule numbers are ubiquitous in biological systems. Important examples include gene expression and enzymatic processes in living cells. Such systems are typically modelled as chemical reaction networks whose dynamics are governed by the Chemical Master Equation. Despite its simple structure, no analytic solutions to the Chemical Master Equation are known for most systems. Moreover, stochastic simulations are computationally expensive, making systematic analysis and statistical inference a challenging task. Consequently, significant effort has been spent in recent decades on the development of efficient approximation and inference methods. This article gives an introduction to basic modelling concepts as well as an overview of state of the art methods. First, we motivate and introduce deterministic and stochastic methods for modelling chemical networks, and give an overview of simulation and exact solution methods. Next, we discuss several approximation methods, including the chemical Langevin equation, the system size expansion, moment closure approximations, time-scale separation approximations and hybrid methods. We discuss their various properties and review recent advances and remaining challenges for these methods. We present a comparison of several of these methods by means of a numerical case study and highlight some of their respective advantages and disadvantages. Finally, we discuss the problem of inference from experimental data in the Bayesian framework and review recent methods developed the literature. In summary, this review gives a self-contained introduction to modelling, approximations and inference methods for stochastic chemical kinetics.
Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models
Miyahara, Hideyuki, Tsumura, Koji, Sughiyama, Yuki
We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.
Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment
Huang, Yong, Beck, James L., Li, Hui
The focus in this paper is Bayesian system identification based on noisy incomplete modal data where we can impose spatially-sparse stiffness changes when updating a structural model. To this end, based on a similar hierarchical sparse Bayesian learning model from our previous work, we propose two Gibbs sampling algorithms. The algorithms differ in their strategies to deal with the posterior uncertainty of the equation-error precision parameter, but both sample from the conditional posterior probability density functions (PDFs) for the structural stiffness parameters and system modal parameters. The effective dimension for the Gibbs sampling is low because iterative sampling is done from only three conditional posterior PDFs that correspond to three parameter groups, along with sampling of the equation-error precision parameter from another conditional posterior PDF in one of the algorithms where it is not integrated out as a "nuisance" parameter. A nice feature from a computational perspective is that it is not necessary to solve a nonlinear eigenvalue problem of a structural model. The effectiveness and robustness of the proposed algorithms are illustrated by applying them to the IASE-ASCE Phase II simulated and experimental benchmark studies. The goal is to use incomplete modal data identified before and after possible damage to detect and assess spatially-sparse stiffness reductions induced by any damage. Our past and current focus on meeting challenges arising from Bayesian inference of structural stiffness serve to strengthen the capability of vibration-based structural system identification but our methods also have much broader applicability for inverse problems in science and technology where system matrices are to be inferred from noisy partial information about their eigenquantities.
Machine Learning Algorithms for Business Applications - Complete Guide -
With the development of free, open-source machine learning and artificial intelligence tools like Google's TensorFlow and sci-kit learn, as well as "ML-as-a-service" products like Google's Cloud Prediction API and Microsoft's Azure Machine Learning platform, it's never been easier for companies of all sizes to harness the power of data. But machine learning is such a vast, complex field. Where do you start learning how to use it in your business? In this article, we'll survey the current landscape of machine learning algorithms and explain how they work, provide example applications, share how other companies use them, and provide further resources on learning about them. This executive overview will provide the first step in learning how to apply machine learning algorithm(s) to make your business more efficient, more effective, and more profitable.
Optimal Inference in Crowdsourced Classification via Belief Propagation
Ok, Jungseul, Oh, Sewoong, Shin, Jinwoo, Yi, Yung
Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap by introducing a tighter lower bound on the fundamental limit and proving that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. Experimental results suggest that BP is close to optimal for all regimes considered and improves upon competing state-of-the-art algorithms.
Modeling Grasp Motor Imagery through Deep Conditional Generative Models
Veres, Matthew, Moussa, Medhat, Taylor, Graham W.
Grasping is a complex process involving knowledge of the object, the surroundings, and of oneself. While humans are able to integrate and process all of the sensory information required for performing this task, equipping machines with this capability is an extremely challenging endeavor. In this paper, we investigate how deep learning techniques can allow us to translate high-level concepts such as motor imagery to the problem of robotic grasp synthesis. We explore a paradigm based on generative models for learning integrated object-action representations, and demonstrate its capacity for capturing and generating multimodal, multi-finger grasp configurations on a simulated grasping dataset.
Choosing a Machine Learning Classifier
How do you know what machine learning algorithm to choose for your classification problem? Of course, if you really care about accuracy, your best bet is to test out a couple different ones (making sure to try different parameters within each algorithm as well), and select the best one by cross-validation. But if you're simply looking for a "good enough" algorithm for your problem, or a place to start, here are some general guidelines I've found to work well over the years. If your training set is small, high bias/low variance classifiers (e.g., Naive Bayes) have an advantage over low bias/high variance classifiers (e.g., kNN), since the latter will overfit. But low bias/high variance classifiers start to win out as your training set grows (they have lower asymptotic error), since high bias classifiers aren't powerful enough to provide accurate models.