Goto

Collaborating Authors

 Undirected Networks


A Policy Gradient Method with Variance Reduction for Uplift Modeling

arXiv.org Machine Learning

Uplift modeling aims to directly model the incremental impact of a treatment on an individual response. It has been widely and successfully used in healthcare analytics and business operations, where one tries to measure the net effect of a new medicine on patients or to understand the impact of a marketing campaign on company revenue. In this work, we address the problem from a new angle and reformulate it as a Markov Decision Process (MDP). This new formulation allows us to handle the lack of explicit labels, to deal with any number of actions (in comparison to the normal two action uplift modeling), and to apply it to applications with responses of general types, which is a challenging task for previous methods. Furthermore, we also design an unbiased metric for more accurate offline evaluation of uplift effects, set up a better reward function for the policy gradient method to solve the problem and adopt some action-based baselines to reduce variance. We conducted extensive experiments on both a synthetic dataset and real-world scenarios, and showed that our method can achieve significant improvement over previous methods.


Forecasting market states

arXiv.org Machine Learning

In common terminology, there are periods of'bull' market in which prices are more likely to rise and periods of'bear' market in which prices are more likely to fall. These different'states' of markets are commonly attributed in literature to unobservable, orlatent, regimes representing a set of macroeconomic, market and sentiment variables. Many time series models presented in literature tried to capture this phenomenon. Among the most popular methods, it is worth mentioning the TAR models (Tong 1978), trying to estimate'structural breaks' in the time series process, and the Markov Switching models (Hamilton 1989), where the change in regimes are parametrized by means of an unobserved state variable typically modelledas Markov chain. However, the application of TAR models in finance is frequently criticized since it cannot be established with certainty when a structural break has occurred in economic time series and the prior knowledge of major economic events could lead to bias in inference (Campbellet al. 1997). Markov switching models, on the other hand, are highly affected by the curse of dimensionality. In particular, for slightly more complex dynamics than the original proposal (Hamilton 1989), we need to rely on variational inference techniques or MCMC methods (Tsay 2005, Kim and Nelson 1999). This implies that, in a multivariate context and particularly if November 27, 2018 ForecastingMarketStates v2.1


Markov Chain Block Coordinate Descent

arXiv.org Machine Learning

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.


Black-Box Autoregressive Density Estimation for State-Space Models

arXiv.org Machine Learning

State-space models (SSMs) provide a flexible framework for modelling time-series data. Consequently, SSMs are ubiquitously applied in areas such as engineering, econometrics and epidemiology. In this paper we provide a fast approach for approximate Bayesian inference in SSMs using the tools of deep learning and variational inference.


Model-Based Reinforcement Learning in Contextual Decision Processes

arXiv.org Machine Learning

We study the sample complexity of model-based reinforcement learning in general contextual decision processes. We design new algorithms for RL with an abstract model class and analyze their statistical properties. Our algorithms have sample complexity governed by a new structural parameter called the witness rank, which we show to be small in several settings of interest, including Factored MDPs and reactive POMDPs. We also show that the witness rank of a problem is never larger than the recently proposed Bellman rank parameter governing the sample complexity of the model-free algorithm OLIVE (Jiang et al., 2017), the only other provably sample efficient algorithm at this level of generality. Focusing on the special case of Factored MDPs, we prove an exponential lower bound for all model-free approaches, including OLIVE, which when combined with our algorithmic results demonstrates exponential separation between model-based and model-free RL in some rich-observation settings.


Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation

arXiv.org Machine Learning

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, Gen-Oja, for these problems. We prove the global convergence of our algorithm, borrowing ideas from the theory of fast-mixing Markov chains and two-time-scale stochastic approximation, showing that it achieves the optimal rate of convergence. In the process, we develop tools for understanding stochastic processes with Markovian noise which might be of independent interest.


Urban Driving with Multi-Objective Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Autonomous driving is a challenging domain that entails multiple aspects: a vehicle should be able to drive to its destination as fast as possible while avoiding collision, obeying traffic rules and ensuring the comfort of passengers. In this paper, we present a deep learning variant of thresholded lexicographic Q-learning for the task of urban driving. Our multi-objective DQN agent learns to drive on multi-lane roads and intersections, yielding and changing lanes according to traffic rules. We also propose an extension for factored Markov Decision Processes to the DQN architecture that provides auxiliary features for the Q function. This is shown to significantly improve data efficiency. We then show that the learned policy is able to zero-shot transfer to a ring road without sacrificing performance. To our knowledge, this is the first reinforcement learning based autonomous driving agent in literature that can handle multi-lane intersections with traffic rules.


Energy Efficiency in Reinforcement Learning for Wireless Sensor Networks

arXiv.org Machine Learning

As sensor networks for health monitoring become more prevalent, so will the need to control their usage and consumption of energy. This paper presents a method which leverages the algorithm's performance and energy consumption. By utilising Reinforcement Learning (RL) techniques, we provide an adaptive framework, which continuously performs weak training in an energy-aware system. We motivate this using a realistic example of residential localisation based on Received Signal Strength (RSS). The method is cheap in terms of work-hours, calibration and energy usage. It achieves this by utilising other sensors available in the environment. These other sensors provide weak labels, which are then used to employ the State-Action-Reward-State-Action (SARSA) algorithm and train the model over time. Our approach is evaluated on a simulated localisation environment and validated on a widely available pervasive health dataset which facilitates realistic residential localisation using RSS. We show that our method is cheaper to implement and requires less effort, whilst at the same time providing a performance enhancement and energy savings over time.


Efficient keyword spotting using dilated convolutions and gating

arXiv.org Machine Learning

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset - "Hey Snips" utterances recorded by over 2.2K different speakers - has been made publicly available to establish an open reference for wake-word detection.


Scalable agent alignment via reward modeling: a research direction

arXiv.org Artificial Intelligence

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task objective. This gives rise to the agent alignment problem: how do we create agents that behave in accordance with the user's intentions? We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning. We discuss the key challenges we expect to face when scaling reward modeling to complex and general domains, concrete approaches to mitigate these challenges, and ways to establish trust in the resulting agents.