Bayesian Inference
Information-Theoretic Representation Learning for Positive-Unlabeled Classification
Sakai, Tomoya, Niu, Gang, Sugiyama, Masashi
In real-world applications, it is conceivable that only positive and unlabeled (PU) data are available for training a classifier. For instance, in land-cover image classification, images of urban regions can be easily labeled, while images of non-urban regions are difficult to annotate due to high diversity of non-urban regions containing, e.g., forest, seas, grasses, and soil (Li et al., 2011). To cope with such situations, PU classification has been actively studied (Letouzey et al., 2000; Elkan and Noto, 2008; du Plessis et al., 2015), and the state-of-the-art method allows us to systematically train deep neural networks only from PU data (Kiryo et al., 2017). However, existing PU classification methods typically require an estimate of the class-prior probability, and their performance is sensitive to the quality of class-prior estimation (Kiryo et al., 2017). Although various class-prior estimation methods from PU data have been proposed so far (du Plessis and Sugiyama, 2014; Ramaswamy et al., 2016; Jain et al., 2016; du Plessis et al., 2017; Northcutt et al., 2017), accurate estimation of the class-prior is still highly challenging particularly for high-dimensional data.
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.
Physics-constrained, data-driven discovery of coarse-grained dynamics
Felsberger, L., Koutsourelakis, P. S.
The combination of high-dimensionality and disparity of time scales encountered in many problems in computational physics has motivated the development of coarse-grained (CG) models. In this paper, we advocate the paradigm of data-driven discovery for extract- ing governing equations by employing fine-scale simulation data. In particular, we cast the coarse-graining process under a probabilistic state-space model where the transition law dic- tates the evolution of the CG state variables and the emission law the coarse-to-fine map. The directed probabilistic graphical model implied, suggests that given values for the fine- grained (FG) variables, probabilistic inference tools must be employed to identify the cor- responding values for the CG states and to that end, we employ Stochastic Variational In- ference. We advocate a sparse Bayesian learning perspective which avoids overfitting and reveals the most salient features in the CG evolution law. The formulation adopted enables the quantification of a crucial, and often neglected, component in the CG process, i.e. the pre- dictive uncertainty due to information loss. Furthermore, it is capable of reconstructing the evolution of the full, fine-scale system. We demonstrate the efficacy of the proposed frame- work in high-dimensional systems of random walkers.
Machine Learning in Robotics - 5 Modern Applications
As the term "machine learning" has heated up, interest in "robotics" (as expressed in Google Trends) has not altered much over the last three years. So how much of a place is there for machine learning in robotics? While only a portion of recent developments in robotics can be credited to developments and uses of machine learning, I've aimed to collect some of the more prominent applications together in this article, along with links and references. Before I delve into machine learning in robotics, go ahead and define "robot". Though at first this might seem simple, it's no easy task to come to an agreement on just what a robot is and what it is not, even amongst roboticists.
Bayesian inference for bivariate ranks
Guillotte, Simon, Perron, Franรงois, Segers, Johan
A recommender system based on ranks is proposed, where an expert's ranking of a set of objects and a user's ranking of a subset of those objects are combined to make a prediction of the user's ranking of all objects. The rankings are assumed to be induced by latent continuous variables corresponding to the grades assigned by the expert and the user to the objects. The dependence between the expert and user grades is modelled by a copula in some parametric family. Given a prior distribution on the copula parameter, the user's complete ranking is predicted by the mode of the posterior predictive distribution of the user's complete ranking conditional on the expert's complete and the user's incomplete rankings. Various Markov chain Monte-Carlo algorithms are proposed to approximate the predictive distribution or only its mode. The predictive distribution can be obtained exactly for the Farlie-Gumbel-Morgenstern copula family, providing a benchmark for the approximation accuracy of the algorithms. The method is applied to the MovieLens 100k dataset with a Gaussian copula modelling dependence between the expert's and user's grades.
Probabilistic Planning With Influence Diagrams
Lee, Junkyu (University of California, Irvine)
Graphical models provide a powerful framework for reasoning under uncertainty, and an influence diagram (ID) is a graphical model of a sequential decision problem that maximizes the total expected utility of a non-forgetting agent. Relaxing the regular modeling assumptions, an ID can be flexibly extended to general decision scenarios involving a limited memory agent or multi-agents. The approach of probabilistic planning with IDs is expected to gain computational leverage by exploiting the local structure as well as representation flexibility of influence diagram frameworks. My research focuses on graphical model inference for IDs and its application to probabilistic planning, targeting online MDP/POMDP planning as testbeds in the evaluation.
Hawkes Process Inference With Missing Data
Shelton, Christian R. (University of California, Riverside) | Qin, Zhen (University of California, Riverisde) | Shetty, Chandini (University of California, Riverside)
A multivariate Hawkes process is a class of marked point processes: A sample consists of a finite set of events of unbounded random size; each event has a real-valued time and a discrete-valued label (mark). It is self-excitatory: Each event causes an increase in the rate of other events (of either the same or a different label) in the (near) future. Prior work has developed methods for parameter estimation from complete samples. However, just as unobserved variables can increase the modeling power of other probabilistic models, allowing unobserved events can increase the modeling power of point processes. In this paper we develop a method to sample over the posterior distribution of unobserved events in a multivariate Hawkes process. We demonstrate the efficacy of our approach, and its utility in improving predictive power and identifying latent structure in real-world data.
Anytime Anyspace AND/OR Best-First Search for Bounding Marginal MAP
Lou, Qi (University of California, Irvine) | Dechter, Rina (University of California, Irvine) | Ihler, Alexander (University of California, Irvine)
Marginal MAP is a key task in Bayesian inference and decision-making. It is known to be very difficult in general, particularly because the evaluation of each MAP assignment requires solving an internal summation problem. In this paper, we propose a best-first search algorithm that provides anytime upper bounds for marginal MAP in graphical models. It folds the computation of external maximization and internal summation into an AND/OR tree search framework, and solves them simultaneously using a unified best-first search algorithm. The algorithm avoids some unnecessary computation of summation sub-problems associated with MAP assignments, and thus yields significant time savings. Furthermore, our algorithm is able to operate within limited memory. Empirical evaluation on three challenging benchmarks demonstrates that our unified best-first search algorithm using pre-compiled variational heuristics often provides tighter anytime upper bounds compared to those state-of-the-art baselines.
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
Su, Jinyue (Fudan University) | Xu, Jiacheng (Fudan University) | Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)
Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
Thompson Sampling for Dynamic Pricing
Ganti, Ravi, Sustik, Matyas, Tran, Quoc, Seaman, Brian
In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.