Uncertainty
ml-notes-why-the-log-likelihood-24f7b6c40f83?utm_content=buffer04f75&gi=96bc516d2eed
Secretly, you are hoping that your model will predict future experiences, people call that "generalisation". If we had a sum instead of a product, we could load one datum at a time compute its partial derivatives, accumulating those gradients and apply the optimisation at the end. This little term is what people call the regularisation term, it takes into account your "prior" knowledge of the problem. Notice how engineering problems pushed us to find better notations or better optimisation procedures, surprisingly in machine learning, the basic probability theories are often not that complicated to grasp but the engineering feat to make them actually work are insane.
Deep Generative Models for Relational Data with Side Information
Hu, Changwei, Rai, Piyush, Carin, Lawrence
We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities for each node, providing superior link prediction performance on more complex networks and better interpretability of the latent features; and (2) a regression model which allows directly conditioning the node latent features on the side information available in form of node attributes. Our framework handles both (1) and (2) via a clean, unified model, which enjoys full local conjugacy via data augmentation, and facilitates efficient inference via closed form Gibbs sampling. Moreover, inference cost scales in the number of edges which is attractive for massive but sparse networks. Our framework is also easily extendable to model weighted networks with count-valued edges. We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.
Bayesian Additive Adaptive Basis Tensor Product Models for Modeling High Dimensional Surfaces: An application to high-throughput toxicity testing
Many modern data sets are sampled with error from complex high-dimensional surfaces. Methods such as tensor product splines or Gaussian processes are effective/well suited for characterizing a surface in two or three dimensions but may suffer from difficulties when representing higher dimensional surfaces. Motivated by high throughput toxicity testing where observed dose-response curves are cross sections of a surface defined by a chemical's structural properties, a model is developed to characterize this surface to predict untested chemicals' dose-responses. This manuscript proposes a novel approach that models the multidimensional surface as a sum of learned basis functions formed as the tensor product of lower dimensional functions, which are themselves representable by a basis expansion learned from the data. The model is described, a Gibbs sampling algorithm proposed, and is investigated in a simulation study as well as data taken from the US EPA's ToxCast high throughput toxicity testing platform.
Nonparametric Bayesian label prediction on a graph
Hartog, Jarno, van Zanten, Harry
An implementation of a nonparametric Bayesian approach to solving binary classification problems on graphs is described. A hierarchical Bayesian approach with a randomly scaled Gaussian prior is considered. The prior uses the graph Laplacian to take into account the underlying geometry of the graph. A method based on a theoretically optimal prior and a more flexible variant using partial conjugacy are proposed. Two simulated data examples and two examples using real data are used in order to illustrate the proposed methods.
ChoiceRank: Identifying Preferences from Node Traffic in Networks
Maystre, Lucas, Grossglauser, Matthias
Consider the problem of estimating click probabilities for links between pages of a website, given a hyperlink graph and aggregate statistics on the number of times each page has been visited. Naively, one might expect that the probability of clicking on a particular link should be roughly proportional to the traffic of the link's target. However, this neglects important structural effects: a page's traffic is influenced by a) the number of incoming links, b) the traffic at the pages that link to it, and c) the traffic absorbed by competing links. In order to successfully infer click probabilities, it is therefore necessary to disentangle the preference for a page (i.e., the intrinsic propensity of a user to click on a link pointing to it) from the page's visibility (the exposure it gets from pages linking to it). Building upon recent work by Kumar et al. [2015], we present a statistical framework that tackles a general formulation of the problem: given a network (representing possible transitions between nodes) and the marginal traffic at each node, recover the transition probabilities.
Just Sort It! A Simple and Effective Approach to Active Preference Learning
Maystre, Lucas, Grossglauser, Matthias
We address the problem of learning a ranking by using adaptively chosen pairwise comparisons. Our goal is to recover the ranking accurately but to sample the comparisons sparingly. If all comparison outcomes are consistent with the ranking, the optimal solution is to use an efficient sorting algorithm, such as Quicksort. But how do sorting algorithms behave if some comparison outcomes are inconsistent with the ranking? We give favorable guarantees for Quicksort for the popular Bradley-Terry model, under natural assumptions on the parameters. Furthermore, we empirically demonstrate that sorting algorithms lead to a very simple and effective active learning strategy: repeatedly sort the items. This strategy performs as well as state-of-the-art methods (and much better than random sampling) at a minuscule fraction of the computational cost.
Planning with Abstract Markov Decision Processes
Gopalan, Nakul (Brown University) | desJardins, Marie (University of Maryland) | Littman, Michael L. (Brown University) | MacGlashan, James (Cogitai Incorporated) | Squire, Shawn (University of Maryland) | Tellex, Stefanie (Brown University) | Winder, John (University of Maryland) | Wong, Lawson L.S. (Brown University)
Robots acting in human-scale environments must plan under uncertainty in large state-action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state-action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level "flat" MDP . AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks.
Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees
Lacerda, Bruno (School of Computer Science University of Birmingham) | Parker, David (School of Computer Science University of Birmingham) | Hawes, Nick (School of Computer Science University of Birmingham)
We present a methodology for the generation of mobile robot controllers which offer probabilistic time-bounded guarantees on successful task completion, whilst also trying to satisfy soft goals. The approach is based on a stochastic model of the robotโs environment and action execution times, a set of soft goals, and a formal task specification in co-safe linear temporal logic, which are analysed using multi-objective model checking techniques for Markov decision processes. For efficiency, we propose a novel two-step approach. First, we explore policies on the Pareto front for minimising expected task execution time whilst optimising the achievement of soft goals. Then, we use this to prune a model with more detailed timing information, yielding a time-dependent policy for which more fine-grained probabilistic guarantees can be provided. We illustrate and evaluate the generation of policies on a delivery task in a care home scenario, where the robot also tries to engage in entertainment activities with the patients.
Differentially Private Learning of Undirected Graphical Models using Collective Graphical Models
Bernstein, Garrett, McKenna, Ryan, Sun, Tao, Sheldon, Daniel, Hay, Michael, Miklau, Gerome
We investigate the problem of learning discrete, undirected graphical models in a differentially private way. We show that the approach of releasing noisy sufficient statistics using the Laplace mechanism achieves a good trade-off between privacy, utility, and practicality. A naive learning algorithm that uses the noisy sufficient statistics "as is" outperforms general-purpose differentially private learning algorithms. However, it has three limitations: it ignores knowledge about the data generating process, rests on uncertain theoretical foundations, and exhibits certain pathologies. We develop a more principled approach that applies the formalism of collective graphical models to perform inference over the true sufficient statistics within an expectation-maximization framework. We show that this learns better models than competing approaches on both synthetic data and on real human mobility data used as a case study.
Provable benefits of representation learning
Arora, Sanjeev, Risteski, Andrej
There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.