I often hear people say that the results from Bayesian methods are the same as the results from frequentist methods, at least under certain conditions. And sometimes it even comes from people who understand Bayesian methods. Today I saw this tweet from Julia Rohrer: "Running a Bayesian multi-membership multi-level probit model with a custom function to generate average marginal effects only to find that the estimate is precisely the same as the one generated by linear regression with dummy-coded group membership." Which elicited what I interpret as good-natured teasing, like this tweet from Daniël Lakens: "I always love it when people realize that the main difference between a frequentist and Bayesian analysis is that for the latter approach you first need to wait 24 hours for the results." Ok, that's funny, but there is a serious point here I want to respond to because both of these comments are based on the premise that we can compare the results from Bayesian and frequentist methods.

A/B testing is used everywhere. A/B testing is all about comparing things. If you're a data scientist, and you want to tell the rest of the company, "logo A is better than logo B", well you can't just say that without proving it using numbers and statistics. Traditional A/B testing has been around for a long time, and it's full of approximations and confusing definitions. In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods.

Shamir, Gil I., Szpankowski, Wojciech

Theoretical results show that Bayesian methods can achieve lower bounds on regret for online logistic regression. In practice, however, such techniques may not be feasible especially for very large feature sets. Various approximations that, for huge sparse feature sets, diminish the theoretical advantages, must be used. Often, they apply stochastic gradient methods with hyper-parameters that must be tuned on some surrogate loss, defeating theoretical advantages of Bayesian methods. The surrogate loss, defined to approximate the mixture, requires techniques as Monte Carlo sampling, increasing computations per example. We propose low complexity analytical approximations for sparse online logistic and probit regressions. Unlike variational inference and other methods, our methods use analytical closed forms, substantially lowering computations. Unlike dense solutions, as Gaussian Mixtures, our methods allow for sparse problems with huge feature sets without increasing complexity. With the analytical closed forms, there is also no need for applying stochastic gradient methods on surrogate losses, and for tuning and balancing learning and regularization hyper-parameters. Empirical results top the performance of the more computationally involved methods. Like such methods, our methods still reveal per feature and per example uncertainty measures.

Canonical correlation analysis is a statistical technique -dating back at least to [1] - that is used to maximally correlate multiple datasets for joint analysis. The technique has become a fundamental tool in biomedical research where technological advances have led to a huge number of multi-omic datasets ([2]; [3]; [4]). Over the past two decades, limited sample sizes, growing dimensionality, and the search for meaningful biological interpretations, have led to the development of sparse canonical correlation analysis ([2]), where a sparsity assumption is imposed on the canonical correlation vectors. This work falls under the topic of the Bayesian estimation of sparse canonical corrlation vectors. Model-based approaches to canonical correlation analysis were developed in the mid 2000's (see e.g., [5]), and paved the way for a Bayesian treatment of canonical correlation analysis ([6];[7]) and sparse canonical correlation analysis ([8]). However an serious shortcoming of such a Bayesian treatment is that this approach naturally requires a complete specification of the joint distribution of the data, so as to specify the likelihood function. This requirement is a serious limitation in many applications, where the data generating process is poorly understood, for example, image data.

Description: This book provides essential language and tools for understanding statistics, randomness, and uncertainty. The book explores a wide variety of applications and examples, ranging from coincidences and paradoxes to Google PageRank and Markov chain Monte Carlo (MCMC). Additional application areas explored include genetics, medicine, computer science, and information theory. The authors present the material in an accessible style and motivate concepts using real-world examples. Be prepared, it is a big book!. Also, check out their great probability cheat sheet here.

Mitros, John, Pakrashi, Arjun, Mac Namee, Brian

Deep neural networks have been successful in diverse discriminative classification tasks, although, they are poorly calibrated often assigning high probability to misclassified predictions. Potential consequences could lead to trustworthiness and accountability of the models when deployed in real applications, where predictions are evaluated based on their confidence scores. Existing solutions suggest the benefits attained by combining deep neural networks and Bayesian inference to quantify uncertainty over the models' predictions for ambiguous datapoints. In this work we propose to validate and test the efficacy of likelihood based models in the task of out of distribution detection (OoD). Across different datasets and metrics we show that Bayesian deep learning models on certain occasions marginally outperform conventional neural networks and in the event of minimal overlap between in/out distribution classes, even the best models exhibit a reduction in AUC scores in detecting OoD data. Preliminary investigations indicate the potential inherent role of bias due to choices of initialisation, architecture or activation functions. We hypothesise that the sensitivity of neural networks to unseen inputs could be a multi-factor phenomenon arising from the different architectural design choices often amplified by the curse of dimensionality. Furthermore, we perform a study to find the effect of the adversarial noise resistance methods on in and out-of-distribution performance, as well as, also investigate adversarial noise robustness of Bayesian deep learners.

In this post, I summarize a series of resources to get started with Bayesian Statistics. I compiled these references based on my experience and opinion as to what a good introduction and next steps are in this process. This is not an academic curriculum or anything tremendously rigorous, but it is a comprehensive list that will surely get you embarked on the journey to revisiting/starting your statistics. Many of the references below were recommended to me in several workshops I've attended, and I want to share with those like me that want to be better at statistics and Machine Learning (ML). The first resource I can think of out there for beginners interested in Bayesian statistics and modeling is Richard McElreath's Statistical Rethinking.

Machine Learning, often called Artificial Intelligence or AI, is one of the most exciting areas of technology at the moment. New to machine learning and seeking ways to enhance your knowledge? Or maybe you work in an industry with artificial intelligence and need a machine learning course to position yourself for advancement? Either way, a machine learning Coursera course is worth considering. There are introductory courses to choose from if you're just getting started, or you can begin with intermediate or advanced options to level up your knowledge. Benzinga is here to help you find a course that fits your needs and busy lifestyle.

Petrović, Luka V., Scholtes, Ingo

We study the problem of learning the Markov order in categorical sequences that represent paths in a network, i.e. sequences of variable lengths where transitions between states are constrained to a known graph. Such data pose challenges for standard Markov order detection methods and demand modelling techniques that explicitly account for the graph constraint. Adopting a multi-order modelling framework for paths, we develop a Bayesian learning technique that (i) more reliably detects the correct Markov order compared to a competing method based on the likelihood ratio test, (ii) requires considerably less data compared to methods using AIC or BIC, and (iii) is robust against partial knowledge of the underlying constraints. We further show that a recently published method that uses a likelihood ratio test has a tendency to overfit the true Markov order of paths, which is not the case for our Bayesian technique. Our method is important for data scientists analyzing patterns in categorical sequence data that are subject to (partially) known constraints, e.g. sequences with forbidden words, mobility trajectories and click stream data, or sequence data in bioinformatics. Addressing the key challenge of model selection, our work is further relevant for the growing body of research that emphasizes the need for higher-order models in network analysis.

By now you have learned the basics of machine learning and a bit of Python 3 and Pandas. Here are a few next steps, and free resources to get you going. I will keep adding information here as I think of it, or from suggestions in the comments. At this point, you should not read the documentation as if were a book (although you can do so if this works for you). Browse the documentation top-down to familiarize yourself with the various topics available.