Learning Graphical Models
Unbiased Bayesian Inference for Population Markov Jump Processes via Random Truncations
Georgoulas, Anastasis, Hillston, Jane, Sanguinetti, Guido
We consider continuous time Markovian processes where populations of individual agents interact stochastically according to kinetic rules. Despite the increasing prominence of such models in fields ranging from biology to smart cities, Bayesian inference for such systems remains challenging, as these are continuous time, discrete state systems with potentially infinite state-space. Here we propose a novel efficient algorithm for joint state / parameter posterior sampling in population Markov Jump processes. We introduce a class of pseudo-marginal sampling algorithms based on a random truncation method which enables a principled treatment of infinite state spaces. Extensive evaluation on a number of benchmark models shows that this approach achieves considerable savings compared to state of the art methods, retaining accuracy and fast convergence. We also present results on a synthetic biology data set showing the potential for practical usefulness of our work.
High Dimensional Bayesian Optimisation and Bandits via Additive Models
Kandasamy, Kirthevasan, Schneider, Jeff, Poczos, Barnabas
Bayesian Optimisation (BO) is a technique used in optimising a $D$-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions, scaling it to high dimensions has been notoriously difficult. Existing literature on the topic are under very restrictive settings. In this paper, we identify two key challenges in this endeavour. We tackle these challenges by assuming an additive structure for the function. This setting is substantially more expressive and contains a richer class of functions than previous work. We prove that, for additive functions the regret has only linear dependence on $D$ even though the function depends on all $D$ dimensions. We also demonstrate several other statistical and computational benefits in our framework. Via synthetic examples, a scientific simulation and a face detection problem we demonstrate that our method outperforms naive BO on additive functions and on several examples where the function is not additive.
Modeling the Mind: A brief review
Creating an accurate simulation of the mind is no easy task, and while it took brilliant minds decades to advance us to where we're at right now, we are still ways off our final goal. It is therefore imperative to have more research carried out in this multidisciplinary field, taking in help from researchers in biology, neuroscience, computer science, but also mathematics, physics, chemistry and imaging, in order to speed up this process and tip the scales in our favor for the upcoming decades. This annual review hopes to provide the required information for anyone who is considering this domain as his future endeavor. The reviews will be tackling relatively global characteristics at first in order to familiarize the reader with the basic foundations, and will be getting progressively more specific and in tune with current research in the upcoming parts.
Logistic Regression and Maximum Entropy explained with examples and code
Logistic Regression is one of the most powerful classification methods within machine learning and can be used for a wide variety of tasks. Think of pre-policing or predictive analytics in health; it can be used to aid tuberculosis patients, aid breast cancer diagnosis, etc. Think of modeling urban growth, analysing mortgage pre-payments and defaults, forecasting the direction and strength of stock market movement, and even predicting sport outcomes. Reading all of this, the theory[1] of Maximum Entropy Classification might look difficult. In my experience, the average Developer does not believe they can design a proper Maximum Entropy / Logistic Regression Classifier from scratch. I strongly disagree: not only is the mathematics behind is relatively simple, it can also be implemented with a few lines of code.
Deep Learning: Definition, Resources, Comparison with Machine Learning
Deep learning is sometimes referred to as the intersection between machine learning and artificial intelligence. It is about designing algorithms that can make robots intelligent, such a face recognition techniques used in drones to detect and target terrorists, or pattern recognition / computer vision algorithms to automatically pilot a plane, a train, a boat or a car. Many deep learning algorithms (clustering, pattern recognition, automated bidding, recommendation engine, and so on) -- even though they appear in new contexts such as IoT or machine to machine communication -- still rely on relatively old-fashioned techniques such as logistic regression, SVM, decision trees, K-NN, naive Bayes, Bayesian modeling, ensembles, random forests, signal processing, filtering, graph theory, gaming theory, and many others. Some are new, such as indexation algorithms to automate digital publishing, improve search engines, or create and manage large catalogs such as Amazon's product listing. As a result, many deep learning practitioners call themselves data scientist, computer scientist, statistician, or sometimes engineer.
Probability Smoothing for Natural Language Processing - Lazy Programmer
This is a very basic technique that can be applied to most machine learning algorithms you will come across when you're doing NLP. Suppose for example, you are creating a "bag of words" model, and you have just collected data from a set of documents with a very small vocabulary. You would naturally assume that the probability of seeing the word "cat" is 1/3, and similarly P(dog) 1/3 and P(parrot) 1/3. Now, suppose I want to determine the probability of P(mouse). Since "mouse" does not appear in my dictionary, its count is 0, therefore P(mouse) 0. If you wanted to do something like calculate a likelihood, you'd have P(document) P(words that are not mouse) \times P(mouse) 0 We simply add 1 to the numerator and the vocabulary size (V total number of distinct words) to the denominator of our probability estimate.
Stochastic Shortest Path with Energy Constraints in POMDPs
Brázdil, Tomáš, Chatterjee, Krishnendu, Chmelík, Martin, Gupta, Anchit, Novotný, Petr
We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.
Unsupervised Semantic Action Discovery from Video Collections
Sener, Ozan, Zamir, Amir Roshan, Wu, Chenxia, Savarese, Silvio, Saxena, Ashutosh
Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks.
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Dann, Christoph, Brunskill, Emma
Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (PAC guarantee). In this paper, we derive an upper PAC bound $\tilde O(\frac{|\mathcal S|^2 |\mathcal A| H^2}{\epsilon^2} \ln\frac 1 \delta)$ and a lower PAC bound $\tilde \Omega(\frac{|\mathcal S| |\mathcal A| H^2}{\epsilon^2} \ln \frac 1 {\delta + c})$ that match up to log-terms and an additional linear dependency on the number of states $|\mathcal S|$. The lower bound is the first of its kind for this setting. Our upper bound leverages Bernstein's inequality to improve on previous bounds for episodic finite-horizon MDPs which have a time-horizon dependency of at least $H^3$.
Deep Learning For Sequential Data – Part II: Constraints Of Traditional Approaches
In the previous blog post, we discussed the nature of sequential data and why we need a robust separate modeling technique to analyze that data. Traditionally, people have been using Hidden Markov Models (HMMs) to analyze sequential data, so we will center the discussion around HMMs in this blog post. HMMs have been implemented for many tasks such as speech recognition, gesture recognition, part-of-speech tagging, and so on. But HMMs place a lot of restrictions as to how we can model our data. HMMs are definitely better than using classical machine learning techniques, but they don't fully cover the needs of all the modern data analysis.