Pajic, Miroslav
Transportation-Inequalities, Lyapunov Stability and Sampling for Dynamical Systems on Continuous State Space
Naeem, Muhammad Abdullah, Pajic, Miroslav
We study the concentration phenomenon for discrete-time random dynamical systems with an unbounded state space. We develop a heuristic approach towards obtaining exponential concentration inequalities for dynamical systems using an entirely functional analytic framework. We also show that existence of exponential-type Lyapunov function, compared to the purely deterministic setting, not only implies stability but also exponential concentration inequalities for sampling from the stationary distribution, via \emph{transport-entropy inequality} (T-E). These results have significant impact in \emph{reinforcement learning} (RL) and \emph{controls}, leading to exponential concentration inequalities even for unbounded observables, while neither assuming reversibility nor exact knowledge of random dynamical system (assumptions at heart of concentration inequalities in statistical mechanics and Markov diffusion processes).
Imputation-Free Learning from Incomplete Observations
Gao, Qitong, Wang, Dong, Amason, Joshua D., Yuan, Siyang, Tao, Chenyang, Henao, Ricardo, Hadziahmetovic, Majda, Carin, Lawrence, Pajic, Miroslav
Although recent works have developed methods that can generate estimations (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks. This is particularly true if the data have large missingness rates or a small population. More importantly, the imputation error could be propagated into the prediction step that follows, causing the gradients used to train the prediction models to be biased. Consequently, in this work, we introduce the importance guided stochastic gradient descent (IGSGD) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. This not only reduces bias but allows the model to exploit the underlying information behind missingness patterns. We test the proposed approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
Learning Optimal Strategies for Temporal Tasks in Stochastic Games
Bozkurt, Alper Kamil, Wang, Yu, Pajic, Miroslav
Linear temporal logic (LTL) is widely used to formally specify complex tasks for autonomy. Unlike usual tasks defined by reward functions only, LTL tasks are noncumulative and require memory-dependent strategies. In this work, we introduce a method to learn optimal controller strategies that maximize the satisfaction probability of LTL specifications of the desired tasks in stochastic games, which are natural extensions of Markov Decision Processes (MDPs) to systems with adversarial inputs. Our approach constructs a product game using the deterministic automaton derived from the given LTL task and a reward machine based on the acceptance condition of the automaton; thus, allowing for the use of a model-free RL algorithm to learn an optimal controller strategy. Since the rewards and the transition probabilities of the reward machine do not depend on the number of sets defining the acceptance condition, our approach is scalable to a wide range of LTL tasks, as we demonstrate on several case studies.
Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning
Bozkurt, Alper Kamil, Wang, Yu, Zavlanos, Michael M., Pajic, Miroslav
Arrows: actions top, left, down, and right; encircled characters: state labels. The actions in states that are not reachable or lead to another LDBA state are not displayed. In all subfigures, the most likely paths are highlighted in red. the baby b, the only allowed action is left and when taken the following situations can happen: (i) the robot hits the wall with probability 0.1 and wakes the baby up; (ii) the robot moves left with probability 0. 8 or moves down with probability 0.1 . If the baby has been woken up, which means the robot could not leave in a single time step (represented by L TL as b null b), the robot should notify the adult (at state a); otherwise, the robot should directly go back to the charger (at state c). The full objective is specified in L TL as ϕ 2 nullnull d nullnullnullnull (1) (b null b) null ( b U (a c)) null nullnull null (2) a null ( a U b) null nullnull null (3) ( b null b nullnull b) ( a U c) null nullnull null (4) c ( a U b) null nullnull null (5) (b null b) a null nullnull null (6) null .