Uncertainty
Learning Registered Point Processes from Idiosyncratic Observations
Xu, Hongteng, Carin, Lawrence, Zha, Hongyuan
A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a "registered" point process that accounts for shared structure, as well as "warping" functions that characterize idiosyncratic aspects of each observed sequence. Under reasonable constraints, in each iteration we update the sample-specific warping functions by solving a set of constrained nonlinear programming problems in parallel, and update the model by maximum likelihood estimation. The justifiability, complexity and robustness of the proposed method are investigated in detail, and the influence of sequence stitching on the learning results is examined empirically. Experiments on both synthetic and real-world data demonstrate that the method yields explainable point process models, achieving encouraging results compared to state-of-the-art methods.
Quantum machine learning: a classical perspective
Ciliberto, Carlo, Herbster, Mark, Ialongo, Alessandro Davide, Pontil, Massimiliano, Rocchetto, Andrea, Severini, Simone, Wossnig, Leonard
Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning techniques to impressive results in regression, classification, data-generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets are motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed-up classical machine learning algorithms. Here we review the literature in quantum machine learning and discuss perspectives for a mixed readership of classical machine learning and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in machine learning are identified as promising directions for the field. Practical questions, like how to upload classical data into quantum form, will also be addressed.
Distributed Bayesian Matrix Factorization with Limited Communication
Qin, Xiangju, Blomstedt, Paul, Leppäaho, Eemeli, Parviainen, Pekka, Kaski, Samuel
Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and their confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show empirically on both real and simulated data that our distributed approach is able to achieve a speed-up of almost an order of magnitude, with a negligible effect on predictive accuracy.
Probabilistic Warnings in National Security Crises: Pearl Harbor Revisited
Blum, David M., Pate-Cornell, M. Elisabeth
Imagine a situation where a group of adversaries is preparing an attack on the United States or U.S. interests. An intelligence analyst has observed some signals, but the situation is rapidly changing. The analyst faces the decision to alert a principal decision maker that an attack is imminent, or to wait until more is known about the situation. This warning decision is based on the analyst's observation and evaluation of signals, independent or correlated, and on her updating of the prior probabilities of possible scenarios and their outcomes. The warning decision also depends on the analyst's assessment of the crisis' dynamics and perception of the preferences of the principal decision maker, as well as the lead time needed for an appropriate response. This article presents a model to support this analyst's dynamic warning decision. As with most problems involving warning, the key is to manage the tradeoffs between false positives and false negatives given the probabilities and the consequences of intelligence failures of both types. The model is illustrated by revisiting the case of the attack on Pearl Harbor in December 1941. It shows that the radio silence of the Japanese fleet carried considerable information (Sir Arthur Conan Doyle's "dog in the night" problem), which was misinterpreted at the time. Even though the probabilities of different attacks were relatively low, their consequences were such that the Bayesian dynamic reasoning described here may have provided valuable information to key decision makers.
Distinguishing Question Subjectivity from Difficulty for Improved Crowdsourcing
Jin, Yuan, Carman, Mark, Zhu, Ye, Buntine, Wray
Their joint effects give rise to the variation in responses to the same question by different crowdworkers. This variation is low when the question is easy to answer and objective, and high when it is difficult and subjective. Unfortunately, current quality control methods for crowdsourcing consider only the question difficulty to account for the variation. As a result, these methods cannot distinguish workers' personal preferences for different correct answers of a partially subjective question from their ability/expertise to avoid objectively wrong answers for that question. To address this issue, we present a probabilistic model which (i) explicitly encodes question difficulty as a model parameter and (ii) implicitly encodes question subjectivity via latent preference factors for crowd-workers. We show that question subjectivity induces grouping of crowd-workers, revealed through clustering of their latent preferences. Moreover, we develop a quantitative measure of the subjectivity of a question. Experiments show that our model (1) improves the performance of both quality control for crowdsourced answers and next answer prediction for crowd-workers, and (2) can potentially provide coherent rankings of questions in terms of their difficulty and subjectivity, so that task providers can refine their designs of the crowdsourcing tasks, e.g. by removing highly subjective questions or inappropriately difficult questions.
Markov Chain Monte Carlo in Python – Towards Data Science
The past few months, I encountered one term again and again in the data science world: Markov Chain Monte Carlo. In my research lab, in podcasts, in articles, every time I heard the phrase I would nod and think that sounds pretty cool with only a vague idea of what anyone was talking about. Several times I tried to learn MCMC and Bayesian inference, but every time I started reading the books, I soon gave up. Exasperated, I turned to the best method to learn any new skill: apply it to a problem. Using some of my sleep data I had been meaning to explore and a hands-on application-based book (Bayesian Methods for Hackers, available free online), I finally learned Markov Chain Monte Carlo through a real-world project.
Information-Theoretic Representation Learning for Positive-Unlabeled Classification
Sakai, Tomoya, Niu, Gang, Sugiyama, Masashi
In real-world applications, it is conceivable that only positive and unlabeled (PU) data are available for training a classifier. For instance, in land-cover image classification, images of urban regions can be easily labeled, while images of non-urban regions are difficult to annotate due to high diversity of non-urban regions containing, e.g., forest, seas, grasses, and soil (Li et al., 2011). To cope with such situations, PU classification has been actively studied (Letouzey et al., 2000; Elkan and Noto, 2008; du Plessis et al., 2015), and the state-of-the-art method allows us to systematically train deep neural networks only from PU data (Kiryo et al., 2017). However, existing PU classification methods typically require an estimate of the class-prior probability, and their performance is sensitive to the quality of class-prior estimation (Kiryo et al., 2017). Although various class-prior estimation methods from PU data have been proposed so far (du Plessis and Sugiyama, 2014; Ramaswamy et al., 2016; Jain et al., 2016; du Plessis et al., 2017; Northcutt et al., 2017), accurate estimation of the class-prior is still highly challenging particularly for high-dimensional data.
Analysis of Thompson Sampling for Gaussian Process Optimization in the Bandit Setting
We further assume that the space X is continuous. Such optimization problems are common in scientific and engineering fields. Examples include learning continuous valuation models (Eric, Freitas and Ghosh, 2008), automatic gait optimization for both quadrupedal and bipedal robots (Lizotte et al., 2007), choosing the optimal derivative of a molecule that best treats a disease (Negoescu, Frazier and Powell, 2011), tuning Hamiltonian based Monte Carlo Samplers (Wang, Mohamed and de Freitas, 2013), etc. A good survey of the problem in practical machine learning applications is presented in Snoek, Larochelle and Adams (2012). We were motivated to study this problem with the application of ranking multiple items on a webpage so as to optimize a diverse range of business metrics like user engagement and revenue from advertisements. In our example, the function f(x) is a utility function composed of various business metrics and x are parameters or knobs that control the relative frequency of different types of items we show on the webpage.
Stochastic quasi-Newton with adaptive step lengths for large-scale problems
We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.
Bridge type classification: supervised learning on a modified NBI dataset
Jootoo, Achyuthan, Lattanzi, David
A key phase in the bridge design process is the selection of the structural system. Due to budget and time constraints, engineers typically rely on engineering judgment and prior experience when selecting a structural system, often considering a limited range of design alternatives. The objective of this study was to explore the suitability of supervised machine learning as a preliminary design aid that provides guidance to engineers with regards to the statistically optimal bridge type to choose, ultimately improving the likelihood of optimized design, design standardization, and reduced maintenance costs. In order to devise this supervised learning system, data for over 600,000 bridges from the National Bridge Inventory database were analyzed. Key attributes for determining the bridge structure type were identified through three feature selection techniques. Potentially useful attributes like seismic intensity and historic data on the cost of materials (steel and concrete) were then added from the US Geological Survey (USGS) database and Engineering News Record. Decision tree, Bayes network and Support Vector Machines were used for predicting the bridge design type. Due to state-to-state variations in material availability, material costs, and design codes, supervised learning models based on the complete data set did not yield favorable results. Supervised learning models were then trained and tested using 10-fold cross validation on data for each state. Inclusion of seismic data improved the model performance noticeably. The data was then resampled to reduce the bias of the models towards more common design types, and the supervised learning models thus constructed showed further improvements in performance. The average recall and precision for the state models was 88.6% and 88.0% using Decision Trees, 84.0% and 83.7% using Bayesian Networks, and 80.8% and 75.6% using SVM.