Uncertainty
Finite Sample Analyses for TD(0) With Function Approximation
Dalal, Gal (Technion, Israel Institute of Technology) | Szörényi, Balázs (Technion, Israel Institute of Technology ) | Thoppe, Gugan (Duke University) | Mannor, Shie (Technion, Israel Institute of Technology)
TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Existing convergence rates for Temporal Difference (TD) methods apply only to somewhat modified versions, e.g., projected variants or ones where stepsizes depend on unknown problem parameters. Our analyses obviate these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. The two are obtained via different approaches that use relatively unknown, recently developed stochastic approximation techniques.
Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method
Su, Jinyue (Fudan University) | Xu, Jiacheng (Fudan University) | Qiu, Xipeng (Fudan University) | Huang, Xuanjing (Fudan University)
Generating plausible and fluent sentence with desired properties has long been a challenge. Most of the recent works use recurrent neural networks (RNNs) and their variants to predict following words given previous sequence and target label. In this paper, we propose a novel framework to generate constrained sentences via Gibbs Sampling. The candidate sentences are revised and updated iteratively, with sampled new words replacing old ones. Our experiments show the effectiveness of the proposed method to generate plausible and diverse sentences.
Dynamic Determinantal Point Processes
Osogami, Takayuki (IBM Research AI) | Raymond, Rudy (IBM Research AI) | Goel, Akshay (Graduate School of Mathematics, Kyushu University) | Shirai, Tomoyuki (Institute of Mathematics for Industry, Kyushu University) | Maehara, Takanori (RIKEN Center for Advanced Intelligence Project)
The determinantal point process (DPP) has been receiving increasing attention in machine learning as a generative model of subsets consisting of relevant and diverse items. Recently, there has been a significant progress in developing efficient algorithms for learning the kernel matrix that characterizes a DPP. Here, we propose a dynamic DPP, which is a DPP whose kernel can change over time, and develop efficient learning algorithms for the dynamic DPP. In the dynamic DPP, the kernel depends on the subsets selected in the past, but we assume a particular structure in the dependency to allow efficient learning. We also assume that the kernel has a low rank and exploit a recently proposed learning algorithm for the DPP with low-rank factorization, but also show that its bottleneck computation can be reduced from O ( M 2 K ) time to O ( M K 2 ) time, where M is the number of items under consideration, and K is the rank of the kernel, which can be set smaller than M by orders of magnitude.
Hierarchical Policy Search via Return-Weighted Density Estimation
Osa, Takayuki (University of Tokyo / RIKEN) | Sugiyama, Masashi (RIKEN / University of Tokyo)
Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL). Hierarchical RL (HRL) tackles this problem by learning a hierarchicalpolicy, where multiple option policies are in charge of different strategies corresponding to modes of a reward function and a gating policy selects the best option for a given context. Although HRL has been demonstrated to be promising, current state-of-the-art methods cannot still perform well in complex real-world problems due to the difficulty of identifying modes of the reward function. In this paper, we propose a novel method called hierarchical policy search via return-weighted density estimation (HPSDE), which can efficiently identify the modes through density estimation with return-weighted importance sampling. Our proposed method finds option policies corresponding to the modes of the return function and automatically determines the number and the location of option policies, which significantly reduces the burden of hyper-parameters tuning. Through experiments, we demonstrate that the proposed HPSDE successfully learns option policies corresponding to modes of the return function and that it can be successfully applied to a motion planning problem of a redundant robotic manipulator.
Measuring Conditional Independence by Independent Residuals: Theoretical Results and Application in Causal Discovery
Zhang, Hao (Fudan University) | Zhou, Shuigeng (Fudan University) | Guan, Jihong (Tongji University)
We investigate the relationship between conditional independence (CI) x ⊥ y | Z and the independence of two residuals x – E ( x | Z ) ⊥ –E ( y | Z ), where x and y are two random variables, and Z is a set of random variables. We show that if x, y and Z are generated by following linear structural equation model and all external influences follow Gaussian distributions, then x ⊥ y | Z if and only if x – E ( x | Z ) ⊥ y – E ( y | Z ). That is, the test of x ⊥ y | Z can be relaxed to a simpler unconditional independence test of x – E ( x | Z ) ⊥ y – E ( y | Z ). Furthermore, if all these external influences follow non-Gaussian distributions and the model satisfies structural faithfulness condition, then we have x ⊥ y | Z ⇔ x – E ( x | Z ) ⊥ y – E ( y | Z ). We apply the results above to the causal discovery problem, where the causal directions are generally determined by a set of V -structures and their consistent propagations, so CI test-based methods can return a set of Markov equivalence classes. We show that in linear non-Gaussian context, x – E ( x | Z ) ⊥ y – E ( y | Z ) ⇒ x – E ( x | Z ) ⊥ z or y – E ( y | Z ⊥ z (∀ z ∈ Z ) if Z is a minimal d -separator, which implies z causes x (or y ) if z directly connects to x (or y ). Therefore, we conclude that CIs have useful information for distinguishing Markov equivalence classes. In summary, compared with the existing discretization-based and kernel-based CI testing methods, the proposed method provides a simpler way to measure CI, which needs only one unconditional independence test and two regression operations. When being applied to causal discovery, it can find more causal relationships, which is experimentally validated.
SELF: Structural Equational Likelihood Framework for Causal Discovery
Cai, Ruichu (Guangdong University of Technology) | Qiao, Jie ( Guangdong University of Technology ) | Zhang, Zhenjie ( Advanced Digital Sciences Center, Illinois at Singapore Pte. Ltd. ) | Hao, Zhifeng (Guangdong University of Technology)
Causal discovery without intervention is well recognized as a challenging yet powerful data analysis tool, boosting the development of other scientific areas, such as biology, astronomy, and social science. The major technical difficulty behind the observation-based causal discovery is to effectively and efficiently identify causes and effects from correlated variables given the existence of significant noises. Previous studies mostly employ two very different methodologies under Bayesian network framework, namely global likelihood maximization and locally complexity analysis over marginal distributions. While these approaches are effective in their respective problem domains, in this paper, we show that they can be combined to formulate a new global optimization model with local statistical significance, called structural equational likelihood framework (or SELF in short). We provide thorough analysis on the soundness of the model under mild conditions and present efficient heuristic-based algorithms for scalable model training. Empirical evaluations using XGBoost validate the superiority of our proposal over state-of-the-art solutions, on both synthetic and real world causal structures.
Thompson Sampling for Dynamic Pricing
Ganti, Ravi, Sustik, Matyas, Tran, Quoc, Seaman, Brian
In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.
Probabilistic Inference Over Repeated Insertion Models
Kenig, Batya (Technion) | Ilijasić, Lovro (Drexel University) | Ping, Haoyue (Drexel University) | Kimelfeld, Benny (Technion) | Stoyanovich, Julia (Drexel University)
Distributions over rankings are used to model user preferences in various settings including political elections and electronic commerce. The Repeated Insertion Model (RIM) gives rise to various known probability distributions over rankings, in particular to the popular Mallows model. However, probabilistic inference on RIM is computationally challenging, and provably intractable in the general case. In this paper we propose an algorithm for computing the marginal probability of an arbitrary partially ordered set over RIM. We analyze the complexity of the algorithm in terms of properties of the model and the partial order, captured by a novel measure termed the "cover width." We also conduct an experimental study of the algorithm over serial and parallelized implementations. Building upon the relationship between inference with rank distributions and counting linear extensions, we investigate the inference problem when restricted to partial orders that lend themselves to efficient counting of their linear extensions.
Constructing Hierarchical Bayesian Networks With Pooling
Nishino, Kaneharu (The University of Tokyo) | Inaba, Mary (The University of Tokyo)
Inspired by the Bayesian brain hypothesis and deep learning, we develop a Bayesian autoencoder, a method of constructing recognition systems using a Bayesian network. We construct hierarchical Bayesian networks based on feature extraction and implement pooling to achieve invariance within a Bayesian network framework. The constructed networks propagate information bidirectionally between layers. We expect they will be able to achieve brain-like recognition using local features and global information such as their environments.
Investigating the Role of Ensemble Learning in High-Value Wine Identification
Portinale, Luigi (Computer Science Institute, University of Piemonte Orientale) | Locatelli, Monica (University of Piemonte Orientale)
We tackle the problem of authenticating high value Italian wines through machine learning classification. The problem is a seriuos one, since protection of high quality wines from forgeries is worth several million of Euros each year. In a previous work we have identified some base models (in particular classifiers based on Bayesian network (BNC), multi-layer perceptron (MLP) and sequential minimal optimization (SMO)) that well behave using unexpensive chemical analyses of the interested wines. In the present paper, we investigate the role of esemble learning in the construction of more robust classifiers; results suggest that, while bagging and boosting may significantly improve both BNC and MLP, the SMO model is already very robust and efficient as a base learner. We report on results concerning both cross validation on two different datasets, as well as experiments with models trained with the above datasets and tested with a dataset of potentially fake wines; this has been synthesized from a generative probabilistic model learned from real samples and expert knowledge.