Undirected Networks
Reinforcement Learning for Task Specifications with Action-Constraints
Raman, Arun, Shagrithaya, Keerthan, Bhatnagar, Shalabh
In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied. Then we present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints, where we use the development of reward machines to handle the state constraints. We illustrate the method using an example that captures the utility of automata-based methods for non-Markovian state and action specifications for reinforcement learning and show the results of simulations in this setting.
Robust Entropy-regularized Markov Decision Processes
Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with respect to the state transition probabilities, and the estimation of these probabilities may be inaccurate, we study a robust version of the ER-MDP model, where the stochastic optimal policies are required to be robust with respect to the ambiguity in the underlying transition probabilities. Our work is at the crossroads of two important schemes in reinforcement learning (RL), namely, robust MDP and entropy regularized MDP. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings, making the robust ER-MDP problem tractable. We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy iteration, which would lead to new robust RL and inverse RL algorithms to handle uncertainties. Analyses on computational complexity and error propagation under conventional uncertainty settings are also provided.
Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression
Jackson, John, Laurenti, Luca, Frew, Eric, Lahijanian, Morteza
Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties and black-box components that influence the system dynamics. In this article, we develop a framework for verifying partially-observable, discrete-time dynamical systems with unmodelled dynamics against temporal logic specifications from a given input-output dataset. The verification framework employs Gaussian process (GP) regression to learn the unknown dynamics from the dataset and abstract the continuous-space system as a finite-state, uncertain Markov decision process (MDP). This abstraction relies on space discretization and transition probability intervals that capture the uncertainty due to the error in GP regression by using reproducible kernel Hilbert space analysis as well as the uncertainty induced by discretization. The framework utilizes existing model checking tools for verification of the uncertain MDP abstraction against a given temporal logic specification. We establish the correctness of extending the verification results on the abstraction to the underlying partially-observable system. We show that the computational complexity of the framework is polynomial in the size of the dataset and discrete abstraction. The complexity analysis illustrates a trade-off between the quality of the verification results and the computational burden to handle larger datasets and finer abstractions. Finally, we demonstrate the efficacy of our learning and verification framework on several case studies with linear, nonlinear, and switched dynamical systems.
What is Event Knowledge Graph: A Survey
Guan, Saiping, Cheng, Xueqi, Bai, Long, Zhang, Fujun, Li, Zixuan, Zeng, Yutao, Jin, Xiaolong, Guo, Jiafeng
Besides entity-centric knowledge, usually organized as Knowledge Graph (KG), events are also an essential kind of knowledge in the world, which trigger the spring up of event-centric knowledge representation form like Event KG (EKG). It plays an increasingly important role in many machine learning and artificial intelligence applications, such as intelligent search, question-answering, recommendation, and text generation. This paper provides a comprehensive survey of EKG from history, ontology, instance, and application views. Specifically, to characterize EKG thoroughly, we focus on its history, definitions, schema induction, acquisition, related representative graphs/systems, and applications. The development processes and trends are studied therein. We further summarize perspective directions to facilitate future research on EKG.
On some Foundational Aspects of Human-Centered Artificial Intelligence
Serafini, Luciano, Barbosa, Raul, Grosinger, Jasmin, Iocchi, Luca, Napoli, Christian, Rinzivillo, Salvatore, Robin, Jacques, Saffiotti, Alessandro, Scantamburlo, Teresa, Schueller, Peter, Traverso, Paolo, Vazquez-Salceda, Javier
The burgeoning of AI has prompted recommendations that AI techniques should be "human-centered". However, there is no clear definition of what is meant by Human Centered Artificial Intelligence, or for short, HCAI. This paper aims to improve this situation by addressing some foundational aspects of HCAI. To do so, we introduce the term HCAI agent to refer to any physical or software computational agent equipped with AI components and that interacts and/or collaborates with humans. This article identifies five main conceptual components that participate in an HCAI agent: Observations, Requirements, Actions, Explanations and Models. We see the notion of HCAI agent, together with its components and functions, as a way to bridge the technical and non-technical discussions on human-centered AI. In this paper, we focus our analysis on scenarios consisting of a single agent operating in dynamic environments in presence of humans.
Time varying regression with hidden linear dynamics
Jadbabaie, Ali, Mania, Horia, Shah, Devavrat, Sra, Suvrit
The distribution of labels given the covariates changes over time in a variety of applications of regression. Some example domains where such problems arise include economics, marketing, fashion, and supply chain optimization, where market properties evolve over time. Motivated by such problems, we revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system. One way to account for distribution change in linear regression is to assume that the unknown model parameters change slowly with time [2, 15, 37]. While this assumption simplifies the problem and makes it tractable, it misses on exploiting additional structure available and it also fails to model periodicity (e.g., due to seasonality) present in some problems. As an alternative, we are interested in a dynamic model previously studied by Chow [7], Carraro [5], and Shumway et al. [26].
Monte Carlo EM for Deep Time Series Anomaly Detection
Aubet, François-Xavier, Zügner, Daniel, Gasthaus, Jan
Time series data are often corrupted by outliers or other kinds of anomalies. Identifying the anomalous points can be a goal on its own (anomaly detection), or a means to improving performance of other time series tasks (e.g. forecasting). Recent deep-learning-based approaches to anomaly detection and forecasting commonly assume that the proportion of anomalies in the training data is small enough to ignore, and treat the unlabeled data as coming from the nominal data distribution. We present a simple yet effective technique for augmenting existing time series models so that they explicitly account for anomalies in the training data. By augmenting the training data with a latent anomaly indicator variable whose distribution is inferred while training the underlying model using Monte Carlo EM, our method simultaneously infers anomalous points while improving model performance on nominal data. We demonstrate the effectiveness of the approach by combining it with a simple feed-forward forecasting model. We investigate how anomalies in the train set affect the training of forecasting models, which are commonly used for time series anomaly detection, and show that our method improves the training of the model.
Dynamic programming with partial information to overcome navigational uncertainty in a nautical environment
Beeler, Chris, Li, Xinkai, Crowley, Mark, Fraser, Maia, Tamblyn, Isaac
In an MDP, the state of the system is known, however, Uncertainty creates a major obstacle in solving control in a POMDP it must be estimated, leading to some problems. The goal of these problems is to construct a policy amount of uncertainty. Much of the difficulty in solving that is expected to produce optimal trajectories. In some a POMDP stems from estimating the state of the system cases, uncertainty only causes deviations from the optimal before choosing an action. This is where the majority of trajectory, which may still result in an acceptable solution.
Learning Based Task Offloading in Digital Twin Empowered Internet of Vehicles
Zheng, Jinkai, Luan, Tom H., Gao, Longxiang, Zhang, Yao, Wu, Yuan
Mobile edge computing has become an effective and fundamental paradigm for futuristic autonomous vehicles to offload computing tasks. However, due to the high mobility of vehicles, the dynamics of the wireless conditions, and the uncertainty of the arrival computing tasks, it is difficult for a single vehicle to determine the optimal offloading strategy. In this paper, we propose a Digital Twin (DT) empowered task offloading framework for Internet of Vehicles. As a software agent residing in the cloud, a DT can obtain both global network information by using communications among DTs, and historical information of a vehicle by using the communications within the twin. The global network information and historical vehicular information can significantly facilitate the offloading. In specific, to preserve the precious computing resource at different levels for most appropriate computing tasks, we integrate a learning scheme based on the prediction of futuristic computing tasks in DT. Accordingly, we model the offloading scheduling process as a Markov Decision Process (MDP) to minimize the long-term cost in terms of a trade off between task latency, energy consumption, and renting cost of clouds. Simulation results demonstrate that our algorithm can effectively find the optimal offloading strategy, as well as achieve the fast convergence speed and high performance, compared with other existing approaches.
Exponential Family Model-Based Reinforcement Learning via Score Matching
Li, Gene, Li, Junbo, Srebro, Nathan, Wang, Zhaoran, Yang, Zhuoran
This paper studies the regret minimization problem for finite horizon, episodic reinforcement learning (RL) with infinitely large state and action spaces. Empirically, RL has achieved success in diverse domains, even when the problem size (measured in the number of states and actions) explodes [35, 44, 28]. The key to developing sample-efficient algorithms is to leverage function approximation, enabling us to generalize across different state-action pairs. Much theoretical progress has been made towards understanding function approximation in RL. Existing theory typically requires strong linearity assumptions on transition dynamics [e.g., 55, 26, 10, 36] or action-value functions [e.g., 30, 57] of the Markov Decision Process (MDP). However, most real world problems are nonlinear, and our theoretical understanding of these settings remains limited. Thus, we ask the question: Can we design provably efficient RL algorithms in nonlinear environments? Recently, Chowdhury et al. [13] introduced a nonlinear setting where the state-transition measures are finitely parameterized exponential family models, and they proposed to estimate model parameters via maximum likelihood estimation (MLE). The exponential family is a well-studied and powerful statistical framework, so it is a natural model class to consider beyond linear models.