Bayesian Learning
100 Machine Learning videos you can't find in Google โข /r/MachineLearning
Serious answer: I tend to dive deep into a particular algorithm...learning the math better, getting used to different applications of it, etc. So that's where I usually spend my time - along with the advice /u/Jigsus offered...focusing my learning around the kinds of needs I'm working on problem-/data-wise. Sounds like survival analysis, so I try to find as much material focused around that. On the flip side, I haven't done anything like sentiment analysis, so I know next to nothing about Naive Bayes text classification. I tend to read over a rather wide selection of ML and statistics blogs, so I'm not entirely unclear about such things, it's just that I don't spend a copious amount of time other than playing with a toy dataset now and then.
Arimo Predictive Engine (tm) Shows Opportunity to Improve Investor Returns in Peer-to-Peer Lending - Arimo
Random forest model using Lending Club public dataset shows opportunity to improve adjusted return by 2.75% Arimo recently performed a study using a public dataset provided by Lending Club with the goal of showing how machine learning could improve investor returns. To do this we used the PredictiveEngine component of our Data Intelligence Platform, which provides the ability to easily build a variety of predictive machine learning models which scale transparently when deployed on distributed parallel computing platforms. Lending Club is an online peer-to-peer lending company that connects borrowers with investors who have capital to lend. When a loan application is submitted by a borrower, Lending Club reviews and decides whether to offer a loan at a risk-adjusted rate or to reject the application. As of the 3rd quarter of 2015, more than 12 billion in loans have been issued through Lending Club.
How To Think Real Good
First, it is a brain dump: too long, epsilon-baked, and unpolished. Second, it is not obviously relevant to the topic of this site. Third, parts are more technical than most readers would want. However, a quick, bad post may be better than none. This post was prompted by discussions about Bayesianism and the LessWrong rationalist community, with Scott Alexander, Catharine G. Evans, muflax, and St. Rev. (among others). They are each brilliant, quirky, articulate, and fascinating; consider following them online! They might disagree with much of this post, though, and are not implicated in its defects.] This site concerns ways of thinking about some particularly important things: purpose, self, ethics, authority, and meaning, for instance. My aim is to point out common mistakes in thinking about those things, and how to do better. I enjoy thinking about thinking. That's one reason I spent a dozen years in artificial intelligence research. To make a computer think, you'd need to understand how you think. So AI research is a way of thinking about thinking that forces you to be specific. It calls your bluff if you think you understand thinking, but don't. I thought a lot about how to do AI. 1 In 1988, I put together "How to do research at the MIT AI Lab," a guide for graduate students. Although I edited it, it was a collaboration of many people. There are now many similar guides, some of them better, but this was the first.
Fast methods for training Gaussian processes on large data sets
Moore, Christopher J., Chua, Alvin J. K., Berry, Christopher P. L., Gair, Jonathan R.
Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large data sets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences.
Unbiased Bayesian Inference for Population Markov Jump Processes via Random Truncations
Georgoulas, Anastasis, Hillston, Jane, Sanguinetti, Guido
We consider continuous time Markovian processes where populations of individual agents interact stochastically according to kinetic rules. Despite the increasing prominence of such models in fields ranging from biology to smart cities, Bayesian inference for such systems remains challenging, as these are continuous time, discrete state systems with potentially infinite state-space. Here we propose a novel efficient algorithm for joint state / parameter posterior sampling in population Markov Jump processes. We introduce a class of pseudo-marginal sampling algorithms based on a random truncation method which enables a principled treatment of infinite state spaces. Extensive evaluation on a number of benchmark models shows that this approach achieves considerable savings compared to state of the art methods, retaining accuracy and fast convergence. We also present results on a synthetic biology data set showing the potential for practical usefulness of our work.
High Dimensional Bayesian Optimisation and Bandits via Additive Models
Kandasamy, Kirthevasan, Schneider, Jeff, Poczos, Barnabas
Bayesian Optimisation (BO) is a technique used in optimising a $D$-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions, scaling it to high dimensions has been notoriously difficult. Existing literature on the topic are under very restrictive settings. In this paper, we identify two key challenges in this endeavour. We tackle these challenges by assuming an additive structure for the function. This setting is substantially more expressive and contains a richer class of functions than previous work. We prove that, for additive functions the regret has only linear dependence on $D$ even though the function depends on all $D$ dimensions. We also demonstrate several other statistical and computational benefits in our framework. Via synthetic examples, a scientific simulation and a face detection problem we demonstrate that our method outperforms naive BO on additive functions and on several examples where the function is not additive.
Logistic Regression and Maximum Entropy explained with examples and code
Logistic Regression is one of the most powerful classification methods within machine learning and can be used for a wide variety of tasks. Think of pre-policing or predictive analytics in health; it can be used to aid tuberculosis patients, aid breast cancer diagnosis, etc. Think of modeling urban growth, analysing mortgage pre-payments and defaults, forecasting the direction and strength of stock market movement, and even predicting sport outcomes. Reading all of this, the theory[1] of Maximum Entropy Classification might look difficult. In my experience, the average Developer does not believe they can design a proper Maximum Entropy / Logistic Regression Classifier from scratch. I strongly disagree: not only is the mathematics behind is relatively simple, it can also be implemented with a few lines of code.
Deep Learning: Definition, Resources, Comparison with Machine Learning
Deep learning is sometimes referred to as the intersection between machine learning and artificial intelligence. It is about designing algorithms that can make robots intelligent, such a face recognition techniques used in drones to detect and target terrorists, or pattern recognition / computer vision algorithms to automatically pilot a plane, a train, a boat or a car. Many deep learning algorithms (clustering, pattern recognition, automated bidding, recommendation engine, and so on) -- even though they appear in new contexts such as IoT or machine to machine communication -- still rely on relatively old-fashioned techniques such as logistic regression, SVM, decision trees, K-NN, naive Bayes, Bayesian modeling, ensembles, random forests, signal processing, filtering, graph theory, gaming theory, and many others. Some are new, such as indexation algorithms to automate digital publishing, improve search engines, or create and manage large catalogs such as Amazon's product listing. As a result, many deep learning practitioners call themselves data scientist, computer scientist, statistician, or sometimes engineer.
Destination Prediction by Trajectory Distribution Based Model
Besse, Philippe C., Guillouet, Brendan, Loubes, Jean-Michel, Royer, Francois
ONITORING and predicting road traffic is of great importance for traffic managers. With the increase of mobile sensors, such as GPS devices and smartphones, much information is at hand to understand urban traffic. In the last few years, a large amount of research has been conducted in order to use this data to model and analyze road traffic conditions. The aim of this paper is to tackle the issue of predicting the destination of vehicles given a prefix of their trajectory. This problem has been the subject of a Kaggle challenge entitled "ECML/PKDD 15: Taxi Trajectory Prediction (I)" [1]. The observations are time-stamped locations that correspond to the different positions of vehicles moving within a city monitored at different observation times. When dealing with a dataset composed of trajectories, the difficulty lies in the fact that the data convey both spatial information (locations of the vehicles on the map of the city) and temporal information (for each vehicle, the locations are indexed by time, which creates a sequence of locations that compose a full trajectory). Hence the data have a spatiotemporal structure that must be taken into account in order to model their evolution while the trajectories of the destination points to be predicted are unknown. Vehicle trajectories are also constrained to a road network which makes their time progression very irregular.
A Bayesian approach to constrained single- and multi-objective optimization
Feliot, Paul, Bect, Julien, Vazquez, Emmanuel
This article addresses the problem of derivative-free (single- or multi-objective) optimization subject to multiple inequality constraints. Both the objective and constraint functions are assumed to be smooth, non-linear and expensive to evaluate. As a consequence, the number of evaluations that can be used to carry out the optimization is very limited, as in complex industrial design optimization problems. The method we propose to overcome this difficulty has its roots in both the Bayesian and the multi-objective optimization literatures. More specifically, an extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed. This new criterion is naturally adapted to the search of a feasible point when none is available, and reduces to existing Bayesian sampling criteria---the classical Expected Improvement (EI) criterion and some of its constrained/multi-objective extensions---as soon as at least one feasible point is available. The calculation and optimization of the criterion are performed using Sequential Monte Carlo techniques. In particular, an algorithm similar to the subset simulation method, which is well known in the field of structural reliability, is used to estimate the criterion. The method, which we call BMOO (for Bayesian Multi-Objective Optimization), is compared to state-of-the-art algorithms for single- and multi-objective constrained optimization.