PDF


Learning to Optimize under Non-Stationarity

arXiv.org Machine Learning

Consider an online platform that allocates advertisements (ads) to a sequence of users. Upon the arrival of a user, the platform has to deliver an ad to the user. The platform earns a unit of profit if the ad is clicked by the user; otherwise, it gains no profit. The platform has full access to the features of the ads and the users. Following (Agrawal and Goyal, 2013), we assume that a user's click behavior towards an ad, or simply the click through rate (CTR) follows a probability distribution governed by a common linear transformation over the features of the ad and the user. If the platform knows the linear transformation, it would always choose to show the ad with the highest CTR to maximize its profit. In practice, however, this is not always the case. The problem of ads allocation is associated with (at least) the following identifiable challenges: - Uncertainty: The linear transformation is initially unknown to the platform, and due to randomness of the users' behaviors, the platform cannot simply solve a linear equation to obtain the unknown transformation with a small amount of data. It thus has to learn the underlying linear transformation through the samples.


Semi Parametric Estimations of rotating and scaling parameters for aeronautic loads

arXiv.org Machine Learning

In this paper, we perform registration of noisy curves. We provide an appropriate model in estimating the rotation and scaling parameters to adjust a set of curves through a M-estimation procedure. We prove the consistency and the asymptotic normality of our estimators. Numerical simulation and a real life aeronautic example are given to illustrate our methodology.


ML-Net: multi-label classification of biomedical texts with deep neural networks

arXiv.org Machine Learning

In multi-label text classification, each textual document can be assigned with one or more labels. Due to this nature, the multi-label text classification task is often considered to be more challenging compared to the binary or multi-class text classification problems. As an important task with broad applications in biomedicine such as assigning diagnosis codes, a number of different computational methods (e.g. training and combining binary classifiers for each label) have been proposed in recent years. However, many suffered from modest accuracy and efficiency, with only limited success in practical use. We propose ML-Net, a novel deep learning framework, for multi-label classification of biomedical texts. As an end-to-end system, ML-Net combines a label prediction network with an automated label count prediction mechanism to output an optimal set of labels by leveraging both predicted confidence score of each label and the contextual information in the target document. We evaluate ML-Net on three independent, publicly-available corpora in two kinds of text genres: biomedical literature and clinical notes. For evaluation, example-based measures such as precision, recall and f-measure are used. ML-Net is compared with several competitive machine learning baseline models. Our benchmarking results show that ML-Net compares favorably to the state-of-the-art methods in multi-label classification of biomedical texts. ML-NET is also shown to be robust when evaluated on different text genres in biomedicine. Unlike traditional machine learning methods, ML-Net does not require human efforts in feature engineering and is highly efficient and scalable approach to tasks with a large set of labels (no need to build individual classifiers for each separate label). Finally, ML-NET is able to dynamically estimate the label count based on the document context in a more systematic and accurate manner.


Gauges, Loops, and Polynomials for Partition Functions of Graphical Models

arXiv.org Machine Learning

We suggest a new methodology for analysis and computations that combines the gauge transformation (GT) technique from (Chertkov, Chernyak 2006) with the technique developed in (Gurvits 2011, Anari, Gharan 2017, Straszak, Vishnoi 2017) based on the recent progress in the field of real stable polynomials. We show that GTs (while keeping PF invariant) allow representation of PF as a sum of polynomials of variables associated with edges of the graph. A special belief propagation (BP) gauge makes a single out term of the series least sensitive to variations then resulting in the loop series for PF introduced in (Chertkov, Chernyak 2006). In addition to restating the known results in the polynomial form, we also discover a new relation between the computationally tractable BP term (single out term of the loop series evaluated at the BP gauge) and the PF: sequential application of differential operators, each associated with an edge of the graph, to the BP polynomial results in the PF. Each term in the sequence corresponds to a BP polynomial of a modified GM derived by contraction of an edge. Even though complexity of computing factors in the derived GMs grow exponentially with the number of eliminated edges, polynomials associated with the new factors remain bi-stable if the original factors have this property. Moreover, we show that BP estimations for the PF do not decrease with eliminations, thus resulting overall in a new proof of the result following from a combination of (Anari, Gharan 2017) and (Straszak, Vishnoi 2017) that the BP solution of the original GM with factors correspondent to bi-stable polynomials gives a lower bound for PF.


Effects of Dataset properties on the training of GANs

arXiv.org Machine Learning

- Generative Adversarial Networks are a new family of generative models, frequently used for generating photorealistic images. The theory promises for the GAN to eventually reach an equilibrium where generator produces pictures indistinguishable for the training set. In practice, however, a range of problems frequently prevents the system from reaching this equilibrium, with training not progressing ahead due to instabilities or mode collapse. This paper describes a series of experiments trying to identify patterns in regard to the effect of the training set on the dynamics and eventual outcome of the training. Generating images is a task with many applications. As images are a compact and convenient format for communicating for humans, it is desirable for a computer to be able to generate such, as this would enable users to understand a wide range of messages and information faster and with ease. While there exist multiple software tools for generating images, for example photoshop, they are merely a way for a human to translate their idea into an image and take significant amount of effort and experience.


Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

arXiv.org Machine Learning

Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data, here we assume logged, real experience and model alternative outcomes of this experience under counterfactual actions, i.e. actions that were not actually taken. Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. CF-GPS can improve on vanilla model-based RL algorithms by making use of available logged data to de-bias model predictions. In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. We find empirically that these advantages translate into improved policy evaluation and search results on a nontrivial grid-world task. Finally, we show that CF-GPS generalizes the previously proposed Guided Policy Search and that reparameterization-based algorithms such Stochastic V alue Gradient can be interpreted as counterfactual methods. This example tries to illustrate the everyday human capacity to reason about alternate, counterfactual outcomes of past experience with the goal of "mining worlds that could have been" (Pearl & Mackenzie, 2018). Social psychologists theorize that such cognitive processes are beneficial for improving future decision making (Roese, 1997). In this paper we aim to leverage possible advantages of counterfactual reasoning for learning decision making in the reinforcement learning (RL) framework. In spite of recent success, learning policies with standard, model-free RL algorithms can be notoriously data inefficient. This issue can in principle be addressed by learning policies on data synthesized from a model.


Reward-estimation variance elimination in sequential decision processes

arXiv.org Machine Learning

Policy gradient methods are very attractive in reinforcement learning due to their model-free nature and convergence guarantees. These methods, however, suffer from high variance in gradient estimation, resulting in poor sample efficiency. To mitigate this issue, a number of variance-reduction approaches have been proposed. Unfortunately, in the challenging problems with delayed rewards, these approaches either bring a relatively modest improvement or do reduce variance at expense of introducing a bias and undermining convergence. The unbiased methods of gradient estimation, in general, only partially reduce variance, without eliminating it completely even in the limit of exact knowledge of the value functions and problem dynamics, as one might have wished. In this work we propose an unbiased method that does completely eliminate variance under some, commonly encountered, conditions. Of practical interest is the limit of deterministic dynamics and small policy stochasticity. In the case of a quadratic value function, as in linear quadratic Gaussian models, the policy randomness need not be small. We use such a model to analyze performance of the proposed variance-elimination approach and compare it with standard variance-reduction methods. The core idea behind the approach is to use control variates at all future times down the trajectory. We present both a model-based and model-free formulations.


Short-Term Wind-Speed Forecasting Using Kernel Spectral Hidden Markov Models

arXiv.org Machine Learning

In machine learning, a nonparametric forecasting algorithm for time series data has been proposed, called the kernel spectral hidden Markov model (KSHMM). In this paper, we propose a technique for short-term wind-speed prediction based on KSHMM. We numerically compared the performance of our KSHMMbased forecasting technique to other techniques with machine learning, using wind-speed data offered by the National Renewable Energy Laboratory. Our results demonstrate that, compared to these methods, the proposed technique offers comparable or better performance. Keywords: Wind-Speed Prediction, Kernel Methods, Kernel Mean Embedding, Spectral Learning, Hidden Markov Models. 1. Introduction Wind energy is one of the most attractive renewable energy sources.


Adversarial Examples from Cryptographic Pseudo-Random Generators

arXiv.org Machine Learning

In our recent work (Bubeck, Price, Razenshteyn, arXiv:1805.10204) we argued that adversarial examples in machine learning might be due to an inherent computational hardness of the problem. More precisely, we constructed a binary classification task for which (i) a robust classifier exists; yet no non-trivial accuracy can be obtained with an efficient algorithm in (ii) the statistical query model. In the present paper we significantly strengthen both (i) and (ii): we now construct a task which admits (i') a maximally robust classifier (that is it can tolerate perturbations of size comparable to the size of the examples themselves); and moreover we prove computational hardness of learning this task under (ii') a standard cryptographic assumption.


Neural Predictive Belief Representations

arXiv.org Machine Learning

Unsupervised representation learning has succeeded with excellent results in many applications. It is an especially powerful tool to learn a good representation of environments with partial or noisy observations. In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far. In this paper, we investigate whether it is possible to learn such a belief representation using modern neural architectures. Specifically, we focus on one-step frame prediction and two variants of contrastive predictive coding (CPC) as the objective functions to learn the representations. To evaluate these learned representations, we test how well they can predict various pieces of information about the underlying state of the environment, e.g., position of the agent in a 3D maze. We show that all three methods are able to learn belief representations of the environment, they encode not only the state information, but also its uncertainty, a crucial aspect of belief states. We also find that for CPC multi-step predictions and action-conditioning are critical for accurate belief representations in visually complex environments. The ability of neural representations to capture the belief information has the potential to spur new advances for learning and planning in partially observable domains, where leveraging uncertainty is essential for optimal decision making.