Bayesian Learning
Causal Discovery from Subsampled Time Series Data by Constraint Optimization
Hyttinen, Antti, Plis, Sergey, Jรคrvisalo, Matti, Eberhardt, Frederick, Danks, David
This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.
Review of the Use of Bayesian Networks in Finance
Bayesian Networks are a tool of new application to the question of risks, in particular for modeling operational risk. Its use for measuring operational risk in the financial sector has channeled large efforts in developing new methods that measure this type of risk which allow improving the internal gestation of the operational processes. Applying Bayesian Networks for modeling operational risk presents the opportunity to incorporate elements of qualitative analysis as well as the opinion of experts in the process of selecting interest variables, defining the structure of the model through its dependencies of causality, such as the specification of a priori distributions and conditional probabilities of each node. It has been found that Bayesian models that incorporate data as well as expert judgment (especially about causality) work better than any other method applicable in the field.
Doing Bayesian Data Analysis: Bayesian models of mind, psychometric models, and data analytic models
Bayesian methods can be used in general data-analytic models, in psychometric models, and in models of mind. In all three applications, there is Bayesian estimation of parameter values in a model. What differs between models is the source of the data and the meaning (semantic referent) of the parameters, as described in the diagram below: As an example of a generic data-analytic model, consider data about ice cream sales and sleeve lengths, measured at different times of year. A linear regression model might show a negative slope for the line that describes a trend in the scatter of points. But the slope does not necessarily describe anything in the processes that generated the ice cream sales and sleeve lengths.
From Dependence to Causation
Machine learning is the science of discovering statistical dependencies in data, and the use of those dependencies to perform predictions. During the last decade, machine learning has made spectacular progress, surpassing human performance in complex tasks such as object recognition, car driving, and computer gaming. However, the central role of prediction in machine learning avoids progress towards general-purpose artificial intelligence. As one way forward, we argue that causal inference is a fundamental component of human intelligence, yet ignored by learning algorithms. Causal inference is the problem of uncovering the cause-effect relationships between the variables of a data generating system. Causal structures provide understanding about how these systems behave under changing, unseen environments. In turn, knowledge about these causal dynamics allows to answer "what if" questions, describing the potential responses of the system under hypothetical manipulations and interventions. Thus, understanding cause and effect is one step from machine learning towards machine reasoning and machine intelligence. But, currently available causal inference algorithms operate in specific regimes, and rely on assumptions that are difficult to verify in practice. This thesis advances the art of causal inference in three different ways. First, we develop a framework for the study of statistical dependence based on copulas and random features. Second, we build on this framework to interpret the problem of causal inference as the task of distribution classification, yielding a family of novel causal inference algorithms. Third, we discover causal structures in convolutional neural network features using our algorithms. The algorithms presented in this thesis are scalable, exhibit strong theoretical guarantees, and achieve state-of-the-art performance in a variety of real-world benchmarks.
Top 10 Machine Learning Algorithms
According to a recent study, machine learning algorithms are expected to replace 25% of the jobs across the world, in the next 10 years. With the rapid growth of big data and availability of programming tools like Python and R โmachine learning is gaining mainstream presence for data scientists. Machine learning applications are highly automated and self-modifying which continue to improve over time with minimal human intervention as they learn with more data. For instance, Netflix's recommendation algorithm learns more about the likes and dislikes of a viewer based on the shows every viewer watches. To address the complex nature of various real world data problems, specialized machine learning algorithms have been developed that solve these problems perfectly.
Probably Overthinking It: Learning to Love Bayesian Statistics
I did a webcast earlier today about Bayesian statistics. Some time in the next week, the video should be available from O'Reilly. In the meantime, you can see my slides here: And here's a transcript of what I said: Thanks everyone for joining me for this webcast. At the bottom of this slide you can see the URL for my slides, so you can follow along at home. I'm Allen Downey and I'm a professor at Olin College, which is a new engineering college right outside Boston. Our mission is to fix engineering education, and one of the ways I'm working on that is by teaching Bayesian statistics. Bayesian methods have been the victim of a 200 year smear campaign. If you are interested in the history and the people involved, I recommend this book, The Theory That Would Not Die.
Learning a metric for class-conditional KNN
Im, Daniel Jiwoong, Taylor, Graham W.
Naive Bayes Nearest Neighbour (NBNN) is a simple and effective framework which addresses many of the pitfalls of K-Nearest Neighbour (KNN) classification. It has yielded competitive results on several computer vision benchmarks. Its central tenet is that during NN search, a query is not compared to every example in a database, ignoring class information. Instead, NN searches are performed within each class, generating a score per class. A key problem with NN techniques, including NBNN, is that they fail when the data representation does not capture perceptual (e.g.~class-based) similarity. NBNN circumvents this by using independent engineered descriptors (e.g.~SIFT). To extend its applicability outside of image-based domains, we propose to learn a metric which captures perceptual similarity. Similar to how Neighbourhood Components Analysis optimizes a differentiable form of KNN classification, we propose "Class Conditional" metric learning (CCML), which optimizes a soft form of the NBNN selection rule. Typical metric learning algorithms learn either a global or local metric. However, our proposed method can be adjusted to a particular level of locality by tuning a single parameter. An empirical evaluation on classification and retrieval tasks demonstrates that our proposed method clearly outperforms existing learned distance metrics across a variety of image and non-image datasets.
The Mathematics of Machine Learning R-bloggers
This post was first published on my Linkedin page and posted here as a contributed post. In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.
Sparse additive Gaussian process with soft interactions
A significant portion of existing variable selection methods are only applicable to linear parametric models. Despite the linearity and additivity assumption, variable selection in linear regression models has been popular since 1970; refer to Akaike information criterion [AIC; Akaike (1973)]; Bayesian information criterion [BIC; Schwarz et al (1978)] and Risk inflation criterion [RIC; Foster and George (1994)]. Popular classical sparse-regression methods such as Least absolute shrinkage operator [LASSO; Tibshirani (1996); Efron et al (2004)], and related penalization methods (Fan and Li, 2001; Zou and Hastie, 2005; Zou, 2006; Zhang, 2010) have gained popularity over the last decade due to their simplicity, computational scalability and efficiency in prediction when the underlying relation between the response and the predictors can be adequately described by parametric models. Bayesian methods (Mitchell and Beauchamp, 1988; George and McCulloch, 1993, 1997) with sparsity inducing priors offers greater applicability beyond parametric models and are a convenient alternative when the underlying goal is in inference and uncertainty quantification. However, there is still a limited amount of literature which seriously considers relaxing the linearity assumption, particularly when the dimension of the predictors is high. Moreover, when the focus is on learning the interactions between the variables, parametric models are often restrictive since they require very many parameters to capture the higher-order interaction terms. 2 Smoothing based non-additive nonparametric regression methods (Lafferty and Wasser-man, 2008; Wahba, 1990; Green and Silverman, 1993; Hastie and Tibshirani, 1990) can accommodate a wide range of relationships between predictors and response leading to excellent predictive performance.
Pseudo-Marginal Hamiltonian Monte Carlo
Lindsten, Fredrik, Doucet, Arnaud
Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or pseudo-marginal Metropolis-Hastings (MH) schemes which mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. In scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which approximates the HMC algorithm targeting the marginal posterior of the parameters. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.