Bayesian Learning
Learning Probabilistic Logic Programs in Continuous Domains
Speichert, Stefanie, Belle, Vaishak
The field of statistical relational learning aims at unifying logic and probability to reason and learn from data. Perhaps the most successful paradigm in the field is probabilistic logic programming: the enabling of stochastic primitives in logic programming, which is now increasingly seen to provide a declarative background to complex machine learning applications. While many systems offer inference capabilities, the more significant challenge is that of learning meaningful and interpretable symbolic representations from data. In that regard, inductive logic programming and related techniques have paved much of the way for the last few decades. Unfortunately, a major limitation of this exciting landscape is that much of the work is limited to finite-domain discrete probability distributions. Recently, a handful of systems have been extended to represent and perform inference with continuous distributions. The problem, of course, is that classical solutions for inference are either restricted to well-known parametric families (e.g., Gaussians) or resort to sampling strategies that provide correct answers only in the limit. When it comes to learning, moreover, inducing representations remains entirely open, other than "data-fitting" solutions that force-fit points to aforementioned parametric families. In this paper, we take the first steps towards inducing probabilistic logic programs for continuous and mixed discrete-continuous data, without being pigeon-holed to a fixed set of distribution families. Our key insight is to leverage techniques from piecewise polynomial function approximation theory, yielding a principled way to learn and compositionally construct density functions. We test the framework and discuss the learned representations.
A survey on policy search algorithms for learning robot controllers in a handful of trials
Chatzilygeroudis, Konstantinos, Vassiliades, Vassilis, Stulp, Freek, Calinon, Sylvain, Mouret, Jean-Baptiste
Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.
Structured Bayesian Gaussian process latent variable model: applications to data-driven dimensionality reduction and high-dimensional inversion
Atkinson, Steven, Zabaras, Nicholas
We introduce a methodology for nonlinear inverse problems using a variational Bayesian approach where the unknown quantity is a spatial field. A structured Bayesian Gaussian process latent variable model is used both to construct a low-dimensional generative model of the sample-based stochastic prior as well as a surrogate for the forward evaluation. Its Bayesian formulation captures epistemic uncertainty introduced by the limited number of input and output examples, automatically selects an appropriate dimensionality for the learned latent representation of the data, and rigorously propagates the uncertainty of the data-driven dimensionality reduction of the stochastic space through the forward model surrogate. The structured Gaussian process model explicitly leverages spatial information for an informative generative prior to improve sample efficiency while achieving computational tractability through Kronecker product decompositions of the relevant kernel matrices. Importantly, the Bayesian inversion is carried out by solving a variational optimization problem, replacing traditional computationally-expensive Monte Carlo sampling. The methodology is demonstrated on an elliptic PDE and is shown to return well-calibrated posteriors and is tractable with latent spaces with over 100 dimensions.
Fine Tuning Method by using Knowledge Acquisition from Deep Belief Network
Kamada, Shin, Ichimura, Takumi
Deep Learning is well known to be the representative method of artificial intelligence. The representation learning can discover the good set of features to input patterns and calculate the representation itself. Many kinds of structures and learning methods have been developed to achieve the great success. It is often said that Deep Learning should include the hierarchical model deeply, or the discovering of optimal structure and its parameters of Convolutional Neural Network (CNN) [1] is important. This issue pointed out by many researchers is right definitely, however, the effort to find the optimal structure and the parameters is very expensive and the calculation cost becomes high. To realize high level representation at low calculation cost, the self-organizing mechanism to adjust the structure itself and parameters simultaneously should be required with the statistical learning method. We have developed the structural learning method of Restricted Boltzmann Machine (RBM) [2] by neuron generation/annihilation algorithm [3].
Shortening Time Required for Adaptive Structural Learning Method of Deep Belief Network with Multi-Modal Data Arrangement
Kamada, Shin, Ichimura, Takumi
Recently, Deep Learning has been applied in the techniques of artificial intelligence. Especially, Deep Learning performed good results in the field of image recognition. Most new Deep Learning architectures are naturally developed in image recognition. For this reason, not only the numerical data and text data but also the time-series data are transformed to the image data format. Multi-modal data consists of two or more kinds of data such as picture and text. The arrangement in a general method is formed in the squared array with no specific aim. In this paper, the data arrangement are modified according to the similarity of input-output pattern in Adaptive Structural Learning method of Deep Belief Network. The similarity of output signals of hidden neurons is made by the order rearrangement of hidden neurons. The experimental results for the data rearrangement in squared array showed the shortening time required for DBN learning.
Knowledge Extracted from Recurrent Deep Belief Network for Real Time Deterministic Control
Kamada, Shin, Ichimura, Takumi
Recently, the market on deep learning including not only software but also hardware is developing rapidly. Big data is collected through IoT devices and the industry world will analyze them to improve their manufacturing process. Deep Learning has the hierarchical network architecture to represent the complicated features of input patterns. Although deep learning can show the high capability of classification, prediction, and so on, the implementation on GPU devices are required. We may meet the trade-off between the higher precision by deep learning and the higher cost with GPU devices. We can success the knowledge extraction from the trained deep learning with high classification capability. The knowledge that can realize faster inference of pre-trained deep network is extracted as IF-THEN rules from the network signal flow given input data. Some experiment results with benchmark tests for time series data sets showed the effectiveness of our proposed method related to the computational speed.
Adaptive Learning Method of Recurrent Temporal Deep Belief Network to Analyze Time Series Data
Ichimura, Takumi, Kamada, Shin
Deep Learning has the hierarchical network architecture to represent the complicated features of input patterns. Such architecture is well known to represent higher learning capability compared with some conventional models if the best set of parameters in the optimal network structure is found. We have been developing the adaptive learning method that can discover the optimal network structure in Deep Belief Network (DBN). The learning method can construct the network structure with the optimal number of hidden neurons in each Restricted Boltzmann Machine and with the optimal number of layers in the DBN during learning phase. The network structure of the learning method can be self-organized according to given input patterns of big data set. In this paper, we embed the adaptive learning method into the recurrent temporal RBM and the self-generated layer into DBN. In order to verify the effectiveness of our proposed method, the experimental results are higher classification capability than the conventional methods in this paper.
A Hierarchical Bayesian Linear Regression Model with Local Features for Stochastic Dynamics Approximation
Parsa, Behnoosh, Rajasekaran, Keshav, Meier, Franziska, Banerjee, Ashis G.
One of the challenges in model-based control of stochastic dynamical systems is that the state transition dynamics are involved, and it is not easy or efficient to make good-quality predictions of the states. Moreover, there are not many representational models for the majority of autonomous systems, as it is not easy to build a compact model that captures the entire dynamical subtleties and uncertainties. In this work, we present a hierarchical Bayesian linear regression model with local features to learn the dynamics of a micro-robotic system as well as two simpler examples, consisting of a stochastic mass-spring damper and a stochastic double inverted pendulum on a cart. The model is hierarchical since we assume non-stationary priors for the model parameters. These non-stationary priors make the model more flexible by imposing priors on the priors of the model. To solve the maximum likelihood (ML) problem for this hierarchical model, we use the variational expectation maximization (EM) algorithm, and enhance the procedure by introducing hidden target variables. The algorithm yields parsimonious model structures, and consistently provides fast and accurate predictions for all our examples involving large training and test sets. This demonstrates the effectiveness of the method in learning stochastic dynamics, which makes it suitable for future use in a paradigm, such as model-based reinforcement learning, to compute optimal control policies in real time.
Quantification under prior probability shift: the ratio estimator and its extensions
Vaz, Afonso Fernandes, Izbicki, Rafael, Stern, Rafael Bassi
The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and target populations. In this paper, we derive a new lower bound for the risk of the quantification problem under the prior shift assumption. Complementing this lower bound, we present a new approximately minimax class of estimators, ratio estimators, which generalize several previous proposals in the literature. Using a weaker version of the prior shift assumption, which can be tested, we show that ratio estimators can be used to build confidence intervals for the quantification problem. We also extend the ratio estimator so that it can: (i) incorporate labels from the target population, when they are available and (ii) estimate how the prevalence of positive labels varies according to a function of certain covariates.
Fully Nonparametric Bayesian Additive Regression Trees
George, Edward, Laud, Prakash, Logan, Brent, McCulloch, Robert, Sparapani, Rodney
Bayesian Additive Regression Trees (BART) is a fully Bayesian approach to modeling with ensembles of trees. BART can uncover complex regression functions with high dimensional regressors in a fairly automatic way and provide Bayesian quantification of the uncertainty through the posterior. However, BART assumes IID normal errors. This strong parametric assumption can lead to misleading inference and uncertainty quantification. In this paper, we use the classic Dirichlet process mixture (DPM) mechanism to nonparametrically model the error distribution. A key strength of BART is that default prior settings work reasonably well in a variety of problems. The challenge in extending BART is to choose the parameters of the DPM so that the strengths of the standard BART approach is not lost when the errors are close to normal, but the DPM has the ability to adapt to non-normal errors.