Bayesian Inference
An Introduction to Model-Based Machine Learning - Data Science Blog by Domino
This guest post was written by Daniel Emaasit, a Ph.D Student of Transportation Engineering at the University of Nevada, Las Vegas. Daniel's research interests include the development of probabilistic machine learning methods for high-dimensional data, with applications to urban mobility, transport planning, highway safety, & traffic operations. Don't miss Daniel's webinar on Model-Based Machine Learning and Probabilistic Programming using RStan, scheduled for July 20, 2016 at 11:00 AM PST. This blog post follows my journey from traditional statistical modeling to Machine Learning (ML) and introduces a new paradigm of ML called Model-Based Machine Learning (Bishop, 2013). Model-Based Machine Learning may be of particular interest to statisticians, engineers, or related professionals looking to implement machine learning in their research or practice. During my Masters in Transportation Engineering (2011-2013), I used traditional statistical modeling in my research to study transportation related problems such as highway crashes.
Bayesian Machine Learning, Explained
So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors. Feel free to point them out, either in the comments or privately.
Causal Discovery from Subsampled Time Series Data by Constraint Optimization
Hyttinen, Antti, Plis, Sergey, Jรคrvisalo, Matti, Eberhardt, Frederick, Danks, David
This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.
Review of the Use of Bayesian Networks in Finance
Bayesian Networks are a tool of new application to the question of risks, in particular for modeling operational risk. Its use for measuring operational risk in the financial sector has channeled large efforts in developing new methods that measure this type of risk which allow improving the internal gestation of the operational processes. Applying Bayesian Networks for modeling operational risk presents the opportunity to incorporate elements of qualitative analysis as well as the opinion of experts in the process of selecting interest variables, defining the structure of the model through its dependencies of causality, such as the specification of a priori distributions and conditional probabilities of each node. It has been found that Bayesian models that incorporate data as well as expert judgment (especially about causality) work better than any other method applicable in the field.
Doing Bayesian Data Analysis: Bayesian models of mind, psychometric models, and data analytic models
Bayesian methods can be used in general data-analytic models, in psychometric models, and in models of mind. In all three applications, there is Bayesian estimation of parameter values in a model. What differs between models is the source of the data and the meaning (semantic referent) of the parameters, as described in the diagram below: As an example of a generic data-analytic model, consider data about ice cream sales and sleeve lengths, measured at different times of year. A linear regression model might show a negative slope for the line that describes a trend in the scatter of points. But the slope does not necessarily describe anything in the processes that generated the ice cream sales and sleeve lengths.
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking several response surfaces. Namely, given $L \ge 2$ response surfaces over a continuous input space $\cal X$, the aim is to efficiently find the index of the minimal response across the entire $\cal X$. The response surfaces are not known and have to be noisily sampled one-at-a-time. This setting is motivated by stochastic control applications and requires joint experimental design both in space and response-index dimensions. To generate sequential design heuristics we investigate stepwise uncertainty reduction approaches, as well as sampling based on posterior classification complexity. We also make connections between our continuous-input formulation and the discrete framework of pure regret in multi-armed bandits. To model the response surfaces we utilize kriging surrogates. Several numerical examples using both synthetic data and an epidemics control problem are provided to illustrate our approach and the efficacy of respective adaptive designs.
From Dependence to Causation
Machine learning is the science of discovering statistical dependencies in data, and the use of those dependencies to perform predictions. During the last decade, machine learning has made spectacular progress, surpassing human performance in complex tasks such as object recognition, car driving, and computer gaming. However, the central role of prediction in machine learning avoids progress towards general-purpose artificial intelligence. As one way forward, we argue that causal inference is a fundamental component of human intelligence, yet ignored by learning algorithms. Causal inference is the problem of uncovering the cause-effect relationships between the variables of a data generating system. Causal structures provide understanding about how these systems behave under changing, unseen environments. In turn, knowledge about these causal dynamics allows to answer "what if" questions, describing the potential responses of the system under hypothetical manipulations and interventions. Thus, understanding cause and effect is one step from machine learning towards machine reasoning and machine intelligence. But, currently available causal inference algorithms operate in specific regimes, and rely on assumptions that are difficult to verify in practice. This thesis advances the art of causal inference in three different ways. First, we develop a framework for the study of statistical dependence based on copulas and random features. Second, we build on this framework to interpret the problem of causal inference as the task of distribution classification, yielding a family of novel causal inference algorithms. Third, we discover causal structures in convolutional neural network features using our algorithms. The algorithms presented in this thesis are scalable, exhibit strong theoretical guarantees, and achieve state-of-the-art performance in a variety of real-world benchmarks.
Probably Overthinking It: Learning to Love Bayesian Statistics
I did a webcast earlier today about Bayesian statistics. Some time in the next week, the video should be available from O'Reilly. In the meantime, you can see my slides here: And here's a transcript of what I said: Thanks everyone for joining me for this webcast. At the bottom of this slide you can see the URL for my slides, so you can follow along at home. I'm Allen Downey and I'm a professor at Olin College, which is a new engineering college right outside Boston. Our mission is to fix engineering education, and one of the ways I'm working on that is by teaching Bayesian statistics. Bayesian methods have been the victim of a 200 year smear campaign. If you are interested in the history and the people involved, I recommend this book, The Theory That Would Not Die.
The Mathematics of Machine Learning R-bloggers
This post was first published on my Linkedin page and posted here as a contributed post. In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.
Sparse additive Gaussian process with soft interactions
A significant portion of existing variable selection methods are only applicable to linear parametric models. Despite the linearity and additivity assumption, variable selection in linear regression models has been popular since 1970; refer to Akaike information criterion [AIC; Akaike (1973)]; Bayesian information criterion [BIC; Schwarz et al (1978)] and Risk inflation criterion [RIC; Foster and George (1994)]. Popular classical sparse-regression methods such as Least absolute shrinkage operator [LASSO; Tibshirani (1996); Efron et al (2004)], and related penalization methods (Fan and Li, 2001; Zou and Hastie, 2005; Zou, 2006; Zhang, 2010) have gained popularity over the last decade due to their simplicity, computational scalability and efficiency in prediction when the underlying relation between the response and the predictors can be adequately described by parametric models. Bayesian methods (Mitchell and Beauchamp, 1988; George and McCulloch, 1993, 1997) with sparsity inducing priors offers greater applicability beyond parametric models and are a convenient alternative when the underlying goal is in inference and uncertainty quantification. However, there is still a limited amount of literature which seriously considers relaxing the linearity assumption, particularly when the dimension of the predictors is high. Moreover, when the focus is on learning the interactions between the variables, parametric models are often restrictive since they require very many parameters to capture the higher-order interaction terms. 2 Smoothing based non-additive nonparametric regression methods (Lafferty and Wasser-man, 2008; Wahba, 1990; Green and Silverman, 1993; Hastie and Tibshirani, 1990) can accommodate a wide range of relationships between predictors and response leading to excellent predictive performance.