Undirected Networks
Cross-domain Imitation from Observations
Raychaudhuri, Dripta S., Paul, Sujoy, van Baar, Jeroen, Roy-Chowdhury, Amit K.
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned. In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint, or morphology; we present a novel framework to learn correspondences across such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycle-consistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found, we can directly transfer the demonstrations on one domain to the other and use it for imitation. Experiments across a wide variety of challenging domains demonstrate the efficacy of our approach.
Variational Gaussian Topic Model with Invertible Neural Projections
Wang, Rui, Zhou, Deyu, Xiong, Yuxuan, Huang, Haiping
Neural topic models have triggered a surge of interest in extracting topics from text automatically since they avoid the sophisticated derivations in conventional topic models. However, scarce neural topic models incorporate the word relatedness information captured in word embedding into the modeling process. To address this issue, we propose a novel topic modeling approach, called Variational Gaussian Topic Model (VaGTM). Based on the variational auto-encoder, the proposed VaGTM models each topic with a multivariate Gaussian in decoder to incorporate word relatedness. Furthermore, to address the limitation that pre-trained word embeddings of topic-associated words do not follow a multivariate Gaussian, Variational Gaussian Topic Model with Invertible neural Projections (VaGTM-IP) is extended from VaGTM. Three benchmark text corpora are used in experiments to verify the effectiveness of VaGTM and VaGTM-IP. The experimental results show that VaGTM and VaGTM-IP outperform several competitive baselines and obtain more coherent topics.
On the $\alpha$-lazy version of Markov chains in estimation and testing problems
Perhaps surprisingly, it appears that despite its extensive usage, seemingly no guidelines exist regarding when moving to the lazy version is appropriate. In this paper we make a first step towards a characterization of such scenarios, beginning with Markov chains statistical estimation and testing problems. Parallel to this work, Chan et al. [2021] gave a unified treatment of Markov chains estimation and testing problems in the single trajectory setting, based on the works of Wolfer and Kontorovich mentioned above. Their results hold for irreducible Markov chains and this was achieved by replacing the pseudo spectral gap, which is defined only for ergodic Markov chains, with the cover time, which is defined for every irreducible Markov chains. They then used deep results that connect the cover time with the blanket time.
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection
Alegre, Lucas N., Bazzan, Ana L. C., da Silva, Bruno C.
Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes performance over a possibly infinite random sequence of Markov Decision Processes (MDPs), each of which drawn from some unknown distribution. We call each such MDP a context. Most related works make strong assumptions such as knowledge about the distribution over contexts, the existence of pre-training phases, or a priori knowledge about the number, sequence, or boundaries between contexts. We introduce an algorithm that efficiently learns policies in non-stationary environments. It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics that reflect whether novel, specialized policies need to be created and deployed to tackle novel contexts, or whether previously-optimized ones might be reused. We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses; and (ii) it bounds the rate of false alarm, which is important in order to minimize regret. Our method constructs a mixture model composed of a (possibly infinite) ensemble of probabilistic dynamics predictors that model the different modes of the distribution over underlying latent MDPs. We evaluate our algorithm on high-dimensional continuous reinforcement learning problems and show that it outperforms state-of-the-art (model-free and model-based) RL algorithms, as well as state-of-the-art meta-learning methods specially designed to deal with non-stationarity.
Markdowns in E-Commerce Fresh Retail: A Counterfactual Prediction and Multi-Period Optimization Approach
Hua, Junhao, Yan, Ling, Xu, Huan, Yang, Cheng
In this paper, by leveraging abundant observational transaction data, we propose a novel data-driven and interpretable pricing approach for markdowns, consisting of counterfactual prediction and multi-period price optimization. Firstly, we build a semi-parametric structural model to learn individual price elasticity and predict counterfactual demand. This semi-parametric model takes advantage of both the predictability of nonparametric machine learning model and the interpretability of economic model. Secondly, we propose a multi-period dynamic pricing algorithm to maximize the overall profit of a perishable product over its finite selling horizon. Different with the traditional approaches that use the deterministic demand, we model the uncertainty of counterfactual demand since it inevitably has randomness in the prediction process. Based on the stochastic model, we derive a sequential pricing strategy by Markov decision process, and design a two-stage algorithm to solve it. The proposed algorithm is very efficient. It reduces the time complexity from exponential to polynomial. Experimental results show the advantages of our pricing algorithm, and the proposed framework has been successfully deployed to the well-known e-commerce fresh retail scenario - Freshippo.
Learning User Embeddings from Temporal Social Media Data: A Survey
Hasan, Fatema, Xu, Kevin S., Foulds, James R., Pan, Shimei
User-generated data on social media contain rich information about who we are, what we like and how we make decisions. In this paper, we survey representative work on learning a concise latent user representation (a.k.a. user embedding) that can capture the main characteristics of a social media user. The learned user embeddings can later be used to support different downstream user analysis tasks such as personality modeling, suicidal risk assessment and purchase decision prediction. The temporal nature of user-generated data on social media has largely been overlooked in much of the existing user embedding literature. In this survey, we focus on research that bridges the gap by incorporating temporal/sequential information in user representation learning. We categorize relevant papers along several key dimensions, identify limitations in the current work and suggest future research directions.
Posterior Regularisation on Bayesian Hierarchical Mixture Clustering
Huang, Weipeng, Ng, Tin Lok James, Laitonjam, Nishma, Hurley, Neil J.
The framework is founded on an approach of minimising the Kullback-Leibler (KL) divergence between a variational solution and the posterior, in a constrained space. The works (Dudík et al., 2004, 2007; Altun and Smola, 2006) first raised the idea of including constraints in maximum entropy density estimation and provided a theoretical analysis. Based on convex duality theory, the optimal solution of the regularised posterior is found to be the original posterior of the model, discounted by the constrained pseudo likelihood introduced by the constraints. Later work founded on the idea of posterior constraints includes (Graça et al., 2009) which proposed constraining the E-step of an Expectation-maximization (EM) algorithm, in order to impose feature constraints on the solution.
Amazon.com: Probability and Statistics for Data Science: Math + R + Data (Chapman & Hall/CRC Data Science Series) (9781138393295): Matloff, Norman: Books
I believe that the book describes itself quite well when it says: Mathematically correct yet highly intuitive…This book would be great for a class that one takes before one takes my statistical learning class. I often run into beginning graduate Data Science students whose background is not math (e.g., CS or Business) and they are not ready…The book fills an important niche, in that it provides a self-contained introduction to material that is useful for a higher-level statistical learning course. I think that it compares well with competing books, particularly in that it takes a more "Data Science" and "example driven" approach than more classical books." "This text by Matloff (Univ. of California, Davis) affords an excellent introduction to statistics for the data science student…Its examples are often drawn from data science applications such as hidden Markov models and remote sensing, to name a few… All the models and concepts are explained well in precise mathematical terms (not presented as formal proofs), to help students gain an intuitive understanding."
A Monotone Approximate Dynamic Programming Approach for the Stochastic Scheduling, Allocation, and Inventory Replenishment Problem: Applications to Drone and Electric Vehicle Battery Swap Stations
Asadi, Amin, Pinkley, Sarah Nurre
There is a growing interest in using electric vehicles (EVs) and drones for many applications. However, battery-oriented issues, including range anxiety and battery degradation, impede adoption. Battery swap stations are one alternative to reduce these concerns that allow the swap of depleted for full batteries in minutes. We consider the problem of deriving actions at a battery swap station when explicitly considering the uncertain arrival of swap demand, battery degradation, and replacement. We model the operations at a battery swap station using a finite horizon Markov Decision Process model for the stochastic scheduling, allocation, and inventory replenishment problem (SAIRP), which determines when and how many batteries are charged, discharged, and replaced over time. We present theoretical proofs for the monotonicity of the value function and monotone structure of an optimal policy for special SAIRP cases. Due to the curses of dimensionality, we develop a new monotone approximate dynamic programming (ADP) method, which intelligently initializes a value function approximation using regression. In computational tests, we demonstrate the superior performance of the new regression-based monotone ADP method as compared to exact methods and other monotone ADP methods. Further, with the tests, we deduce policy insights for drone swap stations.