Goto

Collaborating Authors

 Undirected Networks


Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling

arXiv.org Machine Learning

Due in part to the growing sources of data about past sequences of decisions and their outcomes - from marketing to energy management to healthcare - there is increasing interest in developing accurate and efficient algorithms for off-policy policy evaluation. For Markov Decision Processes, this problem was addressed (Precup et al., 2000) early on by importance sampling (IS)(Rubinstein, 1981), a method prone to large variance due to rare events (Glynn, 1994; L'Ecuyer et al., 2009). The per-decision importance sampling estimator of Precup et al. (2000) tries to mitigate this problem by leveraging the temporal structure - earlier rewards cannot depend on later decisions - of the domain. While neither importance sampling (IS) nor per-decision IS (PDIS) assumes the underlying domain is Markov, more recently, a new class of estimators (Hallak and Mannor, 2017; Liu et al., 2018; Gelada and Bellemare, 2019) has been proposed that leverages the Markovian structure. In particular, these approaches propose performing importance sampling over the stationary state-action distributions induced by the corresponding Markov chain for a particular policy. By avoiding the explicit accumulation of likelihood ratios along the trajectories, it is hypothesized that such ratios of stationary distributions could substantially reduce the variance of the resulting estimator, thereby overcoming the "curse of horizon" (Liu et al., 2018) plaguing off-policy evaluation. The recent flurry of empirical results shows significant performance improvements over the alternative methods on a variety of simulation domains. Yet so far there has not been a formal analysis of the accuracy of IS, PDIS, and stationary state-action IS which will strengthen our understanding of their properties, benefits and limitations.


Parallelized Training of Restricted Boltzmann Machines using Markov-Chain Monte Carlo Methods

arXiv.org Machine Learning

Restricted Boltzmann Machine (RBM) is a generative stochastic neural network that can be applied to collaborative filtering technique used by recommendation systems. Prediction accuracy of the RBM model is usually better than that of other models for recommendation systems. However, training the RBM model involves Markov-Chain Monte Carlo (MCMC) method, which is computationally expensive. In this paper, we have successfully applied distributed parallel training using Horovod framework to improve the training time of the RBM model. Our tests show that the distributed training approach of the RBM model has a good scaling efficiency. We also show that this approach effectively reduces the training time to little over 12 minutes on 64 CPU nodes compared to 5 hours on a single CPU node. This will make RBM models more practically applicable in recommendation systems.


Hierarchical Hidden Markov Jump Processes for Cancer Screening Modeling

arXiv.org Machine Learning

Hierarchical Hidden Markov Jump Processes for Cancer Screening Modeling Rui Meng Soper Braden Jan Nygard, Mari Nygrad Herbert Lee UCSC LLNL Cancer Registry of Norway UCSC Abstract Hidden Markov jump processes are an attractive approach for modeling clinical disease progression data because they are explainable and capable of handling both irregularly sampled and noisy data. Most applications in this context consider time-homogeneous models due to their relative computational simplicity. However, the time homogeneous assumption is too strong to accurately model the natural history of many diseases. Moreover, the population at risk is not homogeneous either, since disease exposure and susceptibility can vary considerably. In this paper, we propose a piece-wise stationary transition matrix to explain the heterogeneity in time. We propose a hierarchical structure for the heterogeneity in population, where prior information is considered to deal with unbalanced data. Moreover, an efficient, scalable EM algorithm is proposed for inference. We demonstrate the feasibility and superiority of our model on a cervical cancer screening dataset from the Cancer Registry of Norway. Experiments show that our model outperforms state-of-the-art recurrent neural network models in terms of prediction accuracy and significantly outperforms a standard hidden Markov jump process in generating Kaplan-Meier estimators. 1 Introduction Population-based screening programs for identifying undiagnosed individuals have a long history in improving public health. Examples include screening pro-Preliminary work.


Powering Hidden Markov Model by Neural Network based Generative Models

arXiv.org Machine Learning

Hidden Markov model (HMM) has been successfully used for sequential data modeling problems. In this work, we propose to power the modeling capacity of HMM by bringing in neural network based generative models. The proposed model is termed as GenHMM. In the proposed GenHMM, each HMM hidden state is associated with a neural network based generative model that has tractability of exact likel i-hood and provides efficient likelihood computation. A generative model in GenHMM consists of mixture of generators that are realized by flow models. A learning algorithm for GenHMM is proposed in expectation-maximization framework. The convergence of the learning GenHMM is analyzed. We demonstrate the efficiency of GenHMM by classification tasks on practical sequential data.


Thompson Sampling in Non-Episodic Restless Bandits

arXiv.org Machine Learning

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a sub-linear, $O(\sqrt{T}\log T)$, regret bound for a variant of Thompson sampling. Our analysis applies in the infinite time horizon setting, resolving the open question raised by Jung and Tewari (2019) whose analysis is limited to the episodic case. We adopt their policy mapping framework, which allows our algorithm to be efficient and simultaneously keeps the regret meaningful. Our algorithm adapts the TSDE algorithm of Ouyang et al. (2017) in a non-trivial manner to account for the special structure of restless bandits. We test our algorithm on a simulated dynamic channel access problem with several policy mappings, and the empirical regrets agree with the theoretical bound regardless of the choice of the policy mapping.


Generative Neural Network based Spectrum Sharing using Linear Sum Assignment Problems

arXiv.org Machine Learning

Spectrum management and resource allocation (RA) problems are challenging and critical in a vast number of research areas such as wireless communications and computer networks. The traditional approaches for solving such problems usually consume time and memory, especially for large size problems. Recently different machine learning approaches have been considered as potential promising techniques for combinatorial optimization problems, especially the generative model of the deep neural networks. In this work, we propose a resource allocation deep autoencoder network, as one of the promising generative models, for enabling spectrum sharing in underlay device-to-device (D2D) communication by solving linear sum assignment problems (LSAPs). Specifically, we investigate the performance of three different architectures for the conditional variational autoencoders (CVAE). The three proposed architecture are the convolutional neural network (CVAE-CNN) autoencoder, the feed-forward neural network (CVAE-FNN) autoencoder, and the hybrid (H-CVAE) autoencoder. The simulation results show that the proposed approach could be used as a replacement of the conventional RA techniques, such as the Hungarian algorithm, due to its ability to find solutions of LASPs of different sizes with high accuracy and very fast execution time. Moreover, the simulation results reveal that the accuracy of the proposed hybrid autoencoder architecture outperforms the other proposed architectures and the state-of-the-art DNN techniques.


Extracting Incentives from Black-Box Decisions

arXiv.org Artificial Intelligence

An algorithmic decision-maker incentivizes people to act in certain ways to receive better decisions. These incentives can dramatically influence subjects' behaviors and lives, and it is important that both decision-makers and decision-recipients have clarity on which actions are incentivized by the chosen model. While for linear functions, the changes a subject is incentivized to make may be clear, we prove that for many non-linear functions (e.g. neural networks, random forests), classical methods for interpreting the behavior of models (e.g. input gradients) provide poor advice to individuals on which actions they should take. In this work, we propose a mathematical framework for understanding algorithmic incentives as the challenge of solving a Markov Decision Process, where the state includes the set of input features, and the reward is a function of the model's output. We can then leverage the many toolkits for solving MDPs (e.g. tree-based planning, reinforcement learning) to identify the optimal actions each individual is incentivized to take to improve their decision under a given model. We demonstrate the utility of our method by estimating the maximally-incentivized actions in two real-world settings: a recidivism risk predictor we train using ProPublica's COMPAS dataset, and an online credit scoring tool published by the Fair Isaac Corporation (FICO).


Green Deep Reinforcement Learning for Radio Resource Management: Architecture, Algorithm Compression and Challenge

arXiv.org Artificial Intelligence

AI heralds a step-change in the performance and capability of wireless networks and other critical infrastructures. However, it may also cause irreversible environmental damage due to their high energy consumption. Here, we address this challenge in the context of 5G and beyond, where there is a complexity explosion in radio resource management (RRM). On the one hand, deep reinforcement learning (DRL) provides a powerful tool for scalable optimization for high dimensional RRM problems in a dynamic environment. On the other hand, DRL algorithms consume a high amount of energy over time and risk compromising progress made in green radio research. This paper reviews and analyzes how to achieve green DRL for RRM via both architecture and algorithm innovations. Architecturally, a cloud based training and distributed decision-making DRL scheme is proposed, where RRM entities can make lightweight deep local decisions whilst assisted by on-cloud training and updating. On the algorithm level, compression approaches are introduced for both deep neural networks and the underlying Markov Decision Processes, enabling accurate low-dimensional representations of challenges. To scale learning across geographic areas, a spatial transfer learning scheme is proposed to further promote the learning efficiency of distributed DRL entities by exploiting the traffic demand correlations. Together, our proposed architecture and algorithms provide a vision for green and on-demand DRL capability.


Prediction-based Resource Allocation using Bayesian Neural Networks and Minimum Cost and Maximum Flow Algorithm

arXiv.org Artificial Intelligence

Predictive business process monitoring aims at providing predictions about running instances by analyzing logs of completed cases in a business process. Recently, a lot of research focuses on increasing productivity and efficiency in a business process by forecasting potential problems during its executions. However, most of the studies lack suggesting concrete actions to improve the process. They leave it up to the subjective judgment of a user. In this paper, we propose a novel method to connect the results from predictive business process monitoring to actual business process improvements. More in detail, we optimize the resource allocation in a non-clairvoyant online environment, where we have limited information required for scheduling, by exploiting the predictions. The proposed method integrates the offline prediction model construction that predicts the processing time and the next activity of an ongoing instance using Bayesian Neural Networks (BNNs) with the online resource allocation that is extended from the minimum cost and maximum flow algorithm. To validate the proposed method, we performed experiments using an artificial event log and a real-life event log from a global financial organization.


Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

arXiv.org Artificial Intelligence

Hierarchical Reinforcement Learning (HRL) is a promising approach to solving long-horizon problems with sparse and delayed rewards. Many existing HRL algorithms either use pre-trained low-level skills that are unadaptable, or require domain-specific information to define low-level rewards. In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design. We propose an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy. This auxiliary reward enables efficient, simultaneous learning of the high-level policy and low-level skills without using task-specific knowledge. In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy. Experimental results show that our algorithm dramatically outperforms other state-of-the-art HRL methods in Mujoco domains. We also find both low-level and high-level policies trained by our algorithm transferable.