Bayesian Learning
Bayesian Optimal Pricing, Part 1
Pricing is a common problem faced by businesses, and one that can be addressed effectively by Bayesian statistical methods. We'll step through a simple example and build the background necessary to extend get involved with this approach. Let's start with some hypothetical data. A small company has tried a few different price points (say, one week each) and recorded the demand at each price. We'll abstract away some economic issues in order to focus on the statistical approach.
To Build Truly Intelligent Machines, Teach Them Cause and Effect Quanta Magazine
Artificial intelligence owes a lot of its smarts to Judea Pearl. In the 1980s he led efforts that allowed machines to reason probabilistically. In his latest book, "The Book of Why: The New Science of Cause and Effect," he argues that artificial intelligence has been handicapped by an incomplete understanding of what intelligence really is. Three decades ago, a prime challenge in artificial intelligence research was to program machines to associate a potential cause to a set of observable conditions. Pearl figured out how to do that using a scheme called Bayesian networks.
Omega: An Architecture for AI Unification
We introduce the open-ended, modular, self-improving Omega AI unification architecture which is a refinement of Solomonoff's Alpha architecture, as considered from first principles. The architecture embodies several crucial principles of general intelligence including diversity of representations, diversity of data types, integrated memory, modularity, and higher-order cognition. We retain the basic design of a fundamental algorithmic substrate called an "AI kernel" for problem solving and basic cognitive functions like memory, and a larger, modular architecture that re-uses the kernel in many ways. Omega includes eight representation languages and six classes of neural networks, which are briefly introduced. The architecture is intended to initially address data science automation, hence it includes many problem solving methods for statistical tasks. We review the broad software architecture, higher-order cognition, self-improvement, modular neural architectures, intelligent agents, the process and memory hierarchy, hardware abstraction, peer-to-peer computing, and data abstraction facility.
Nonparametric Bayesian volatility learning under microstructure noise
Gugushvili, Shota, van der Meulen, Frank, Schauer, Moritz, Spreij, Peter
Aiming at financial applications, we study the problem of learning the volatility under market microstructure noise. Specifically, we consider noisy discrete time observations from a stochastic differential equation and develop a novel computational method to learn the diffusion coefficient of the equation. We take a nonparametric Bayesian approach, where we model the volatility function a priori as piecewise constant. Its prior is specified via the inverse Gamma Markov chain. Sampling from the posterior is accomplished by incorporating the Forward Filtering Backward Simulation algorithm in the Gibbs sampler. Good performance of the method is demonstrated on two representative synthetic data examples. Finally, we apply the method on the EUR/USD exchange rate dataset.
Data Science: Supervised Machine Learning in Python
In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.
ABC-CDE: Towards Approximate Bayesian Computation with Complex High-Dimensional Data and Limited Simulations
Izbicki, Rafael, Lee, Ann B., Pospisil, Taylor
Approximate Bayesian Computation (ABC) is typically used when the likelihood is either unavailable or intractable but where data can be simulated under different parameter settings using a forward model. Despite the recent interest in ABC, high-dimensional data and costly simulations still remain a bottleneck. There is also no consensus as to how to best assess the performance of such methods without knowing the true posterior. We show how a nonparametric conditional density estimation (CDE) framework, which we refer to as ABC-CDE, help address three key challenges in ABC: (i) how to efficiently estimate the posterior distribution with limited simulations and different types of data, (ii) how to tune and compare the performance of ABC and related methods in estimating the posterior itself, rather than just certain properties of the density, and (iii) how to efficiently choose among a large set of summary statistics based on a CDE surrogate loss. We provide theoretical and empirical evidence that justify ABC-CDE procedures that directly estimate and assess the posterior based on an initial ABC sample, and we describe settings where standard ABC and regression-based approaches are inadequate.
A Deep Learning Approach with an Attention Mechanism for Automatic Sleep Stage Classification
Längkvist, Martin, Loutfi, Amy
Automatic sleep staging is a challenging problem and state-of-the-art algorithms have not yet reached satisfactory performance to be used instead of manual scoring by a sleep technician. Much research has been done to find good feature representations that extract the useful information to correctly classify each epoch into the correct sleep stage. While many useful features have been discovered, the amount of features have grown to an extent that a feature reduction step is necessary in order to avoid the curse of dimensionality. One reason for the need of such a large feature set is that many features are good for discriminating only one of the sleep stages and are less informative during other stages. This paper explores how a second feature representation over a large set of pre-defined features can be learned using an auto-encoder with a selective attention for the current sleep stage in the training batch. This selective attention allows the model to learn feature representations that focuses on the more relevant inputs without having to perform any dimensionality reduction of the input data. The performance of the proposed algorithm is evaluated on a large data set of polysomnography (PSG) night recordings of patients with sleep-disordered breathing. The performance of the auto-encoder with selective attention is compared with a regular auto-encoder and previous works using a deep belief network (DBN).
[D] Cross-entropy vs. mean-squared error loss • r/MachineLearning
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations. MLE attempts to find the parameter values that maximize the likelihood function, given the observations. The resulting estimate is called a maximum likelihood estimate, which is also abbreviated as MLE. The method of maximum likelihood is used with a wide range of statistical analyses. As an example, suppose that we are interested in the heights of adult female penguins, but are unable to measure the height of every penguin in a population (due to cost or time constraints).
A "quick" introduction to PyMC3 and Bayesian models
We've all been there, maybe 15 minutes before a meeting, at 4 AM after a party, or simply when we feel too lazy to walk. And even though apps like Uber have made it relatively painless, there are still times when it is necessary or practical to just wait for a taxi. So we wait, impatiently, probably while wondering how much we will have to wait. As the name implies, a generative model is a probability model which is able to generate data that looks a lot like the data we might gather from the phenomenon we're trying to model. In our case, we need a model that generates data that looks like waiting times.
Machine Learning and Its Algorithms to Know – MLAlgos
Linear Regression – Simple Linear Regression- there is only independent variable. Multiple Linear Regression- refers to defining a relationship between independent and dependent variables Logistic Regression – A super simple form of regression analysis in which the outcome variable is binary or dichotomous. Helps to estimate adjusted prevalence rates, adjusted for potential confounders (sociodemographic or clinical characteristics) Linear Discriminant Analysis – A generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. Classification and Regression Trees- Decision trees are are an important type of algorithm for predictive modeling machine learning. A greedy algorithm based on divide and conquer rule.