Learning Graphical Models
Destination Prediction by Trajectory Distribution Based Model
Besse, Philippe C., Guillouet, Brendan, Loubes, Jean-Michel, Royer, Francois
ONITORING and predicting road traffic is of great importance for traffic managers. With the increase of mobile sensors, such as GPS devices and smartphones, much information is at hand to understand urban traffic. In the last few years, a large amount of research has been conducted in order to use this data to model and analyze road traffic conditions. The aim of this paper is to tackle the issue of predicting the destination of vehicles given a prefix of their trajectory. This problem has been the subject of a Kaggle challenge entitled "ECML/PKDD 15: Taxi Trajectory Prediction (I)" [1]. The observations are time-stamped locations that correspond to the different positions of vehicles moving within a city monitored at different observation times. When dealing with a dataset composed of trajectories, the difficulty lies in the fact that the data convey both spatial information (locations of the vehicles on the map of the city) and temporal information (for each vehicle, the locations are indexed by time, which creates a sequence of locations that compose a full trajectory). Hence the data have a spatiotemporal structure that must be taken into account in order to model their evolution while the trajectories of the destination points to be predicted are unknown. Vehicle trajectories are also constrained to a road network which makes their time progression very irregular.
Learning theory estimates with observations from general stationary stochastic processes
Hang, Hanyuan, Feng, Yunlong, Steinwart, Ingo, Suykens, Johan A. K.
This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using Gaussian kernels for both least squares and quantile regression. It turns out that for i.i.d.~processes, our learning rates for ERM recover the optimal rates. On the other hand, for non-i.i.d.~processes including geometrically $\alpha$-mixing Markov processes, geometrically $\alpha$-mixing processes with restricted decay, $\phi$-mixing processes, and (time-reversed) geometrically $\mathcal{C}$-mixing processes, our learning rates for SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called "effective number of observations" for various mixing processes.
A Bayesian approach to constrained single- and multi-objective optimization
Feliot, Paul, Bect, Julien, Vazquez, Emmanuel
This article addresses the problem of derivative-free (single- or multi-objective) optimization subject to multiple inequality constraints. Both the objective and constraint functions are assumed to be smooth, non-linear and expensive to evaluate. As a consequence, the number of evaluations that can be used to carry out the optimization is very limited, as in complex industrial design optimization problems. The method we propose to overcome this difficulty has its roots in both the Bayesian and the multi-objective optimization literatures. More specifically, an extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed. This new criterion is naturally adapted to the search of a feasible point when none is available, and reduces to existing Bayesian sampling criteria---the classical Expected Improvement (EI) criterion and some of its constrained/multi-objective extensions---as soon as at least one feasible point is available. The calculation and optimization of the criterion are performed using Sequential Monte Carlo techniques. In particular, an algorithm similar to the subset simulation method, which is well known in the field of structural reliability, is used to estimate the criterion. The method, which we call BMOO (for Bayesian Multi-Objective Optimization), is compared to state-of-the-art algorithms for single- and multi-objective constrained optimization.
What is Human-Centred Machine Learning
This sunday we are running a workshop at ACM CHI 2016 called "Human Centered Machine Learning". I thought I would write an article to explain the general idea (though the workshop itself is a way of better understanding the idea). Statistical Machine Learning is one of the most successful set of techniques to come out of Computer Science in the last decades, and one that a lot of people are thinking about at the moment. It's often presented as quite an impersonal process: machines that learn for themselves, even AI that risk taking over the world. But, in fact, there is a lot of human work that goes into machine learning and not enough people have been talking about that.
Regression, Logistic Regression and Maximum Entropy part 2 (code examples) – Ahmet Taspinar
In the previous blog we have seen the theory and mathematics behind the Maximum Entropy and Logistic Regression Classifiers. Logistic Regression is one of the most powerful classification methods within machine learning and can be used for a wide variety of tasks. Think of pre-policing or predictive analytics in health; it can be used to aid tuberculosis patients, aid breast cancer diagnosis, etc. Think of modeling urban growth, analysing mortgage pre-payments and defaults, forecasting the direction and strength of stock market movement, and even sports. Reading all of this, the theory[1] of Maximum Entropy Classification might look difficult. In my experience, the average Developer does not believe they can design a proper Maximum Entropy / Logistic Regression Classifier from scratch.
The 7 Best Data Science and Machine Learning Podcasts -- The Startup
Data science and machine learning have long been interests of mine, but now that I'm working on Fuzzy.io I need to keep on top of all the news in both fields. My preferred way to do this is through listening to podcasts. I've listened to a bunch of machine learning and data science podcasts in the last few months, so I thought I'd share my favorites: Every other week, they release a 10–15 minute episode where hosts, Kyle and Linda Polich give a short primer on topics like k-means clustering, natural language processing and decision tree learning, often using analogies related to their pet parrot, Yoshi. This is the only place where you'll learn about k-means clustering via placement of parrot droppings.
Bayesian Network-Based Extension for PGP — Estimating Petition Support
Silaghi, Marius (Florida Institute of Technology) | Qin, Song (Florida Institute of Technology) | Matsui, Toshihiro (Nagoya Institute of Technology) | Yokoo, Makoto (Kyushu University)
Consider the problem of estimating the expected number of distinct eligible voters among the authors of a set of electronic signatures gathered for a petition (or citizen initiative) that has to pass legally required thresholds. We formalize this problem and propose an extension to the Pretty Good Privacy Web Of Trust, a mechanism for reciprocally certifying identities between peers. The extension (a) enables agents to certify additional relevant statements about others, and (b) gives agents opportunities for negative authentication statements (e.g., on ineligibility of an identity). A Bayesian Network model enables inferences on the data provided by the proposed PGP extension. Simulations and an agent-based platform are used to validate the concepts.
Learning Continuous State/Action Models for Humanoid Robots
Jackson, Astrid (University of Central Florida) | Sukthankar, Gita (University of Central Florida)
Reinforcement learning (RL) is a popular choice for solving robotic control problems. However, applying RL techniques to controlling humanoid robots with high degrees of freedom remains problematic due to the difficulty of acquiring sufficient training data. The problem is compounded by the fact that most real-world problems involve continuous states and actions. In order for RL to be scalable to these situations it is crucial that the algorithm be sample efficient. Model-based methods tend to be more data efficient than model-free approaches and have the added advantage that a single model can generalize to multiple control problems. This paper proposes a model approximation algorithm for continuous states and actions that integrates case-based reasoning (CBR) and Hidden Markov Models (HMM) to generalize from a small set of state instances. The paper demonstrates that the performance of the learned model is close to that of the system dynamics it approximates, where performance is measured in terms of sampling error.
A Noisy-OR Model for Continuous Time Bayesian Networks
Perreault, Logan (Montana State University) | Strasser, Shane (Montana State University) | Thornton, Monica (Montana State University) | Sheppard, John (Montana State University)
A continuous time Bayesian network is a graphical model capable of describing discrete state systems that evolve in continuous time. Unfortunately, the number of parameters required for each node in the graph is exponential in the number of parents of the node, which can be prohibitively large for many real-world systems. To mitigate this problem, we propose a Noisy-OR model for continuous time Bayesian networks, which can reduce the number of required parameters from exponential to linear. We describe the model, as well as the process required to compute the remaining unspecified parameters. Finally, we experimentally validate the correctness of the proposed Noisy-OR formulation.
Bayesian Networks with Conditional Truncated Densities
Cortijo, Santiago (LIP6 - UPMC) | Gonzales, Christophe (LIP6 - UPMC )
The majority of Bayesian networks learning and inference algorithms rely on the assumption that all random variables are discrete, which is not necessarily the case in real-world problems. In situations where some variables are continuous, a trade-off between the expressive power of the model and the computational complexity of inference has to be done: on one hand, conditional Gaussian models are computationally efficient but they lack expressive power; on the other hand, mixtures of exponentials (MTE), bases or polynomials are expressive but this comes at the expense of tractability. In this paper, we propose an alternative model that lies in between. It is composed of a "discrete" Bayesian network (BN) combined with a set of monodimensional conditional truncated densities modeling the uncertainty over the continuous random variables given their discrete counterpart resulting from a discretization process. We show that inference computation times in this new model are close to those in discrete BNs. Experiments confirm the tractability of the model and highlight its expressive power by comparing it with MTE.