Goto

Collaborating Authors

 Myrisiotis, Dimitrios


Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

arXiv.org Artificial Intelligence

Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB) for the setting of linear Markov decision processes, and provided theoretical guarantees regarding its running time and regret. In real-world scenarios, however, the space usage of this algorithm can be prohibitive due to a utilized linear regression step. We propose and analyze two modifications of LSVI-UCB, which alternate periods of learning and not-learning, to reduce space and time usage while maintaining sublinear regret. We show experimentally, on synthetic data and real-world benchmarks, that our algorithms achieve low space usage and running time, while not significantly sacrificing regret.


Total Variation Distance Estimation Is as Easy as Probabilistic Inference

arXiv.org Artificial Intelligence

Machine learning and data science heavily rely on probability distributions that are widely used to capture dependencies among large number of variables. Such high-dimensional distributions naturally appear in various domains including neuroscience [ROL02, CTY06], bioinformatics [BB01], text and image processing [Mur22], and causal inference [Pea09]. Substantial research has been devoted to developing models that represent high-dimensional probability distributions succinctly. One prevalent approach is through graphical models. In a graphical model, a graph describes the conditional dependencies among variables and the probability distribution is factorized according to the adjacency relationships in the graph [KF09]. When the underlying graph is a directed graph, the model is known as a Bayesian network or Bayes net. Two fundamental computational tasks on distributions are distance computation and probabilistic inference. In this work, we establish a novel connection between these two seemingly different computational tasks.