Markov Models
Unsupervised model-free representation learning
Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available but no or little feedback is provided to the learner, which makes any inference rather challenging. To address this challenge, we formulate the following problem. Given a series of observations $X_0,\dots,X_n$ coming from a large (high-dimensional) space $\mathcal X$, find a representation function $f$ mapping $\mathcal X$ to a finite space $\mathcal Y$ such that the series $f(X_0),\dots,f(X_n)$ preserves as much information as possible about the original time-series dependence in $X_0,\dots,X_n$. We show that, for stationary time series, the function $f$ can be selected as the one maximizing a certain information criterion that we call time-series information. Some properties of this functions are investigated, including its uniqueness and consistency of its empirical estimates. Implications for the problem of optimal control are presented.
Variational Recurrent Models for Solving Partially Observable Control Tasks
Han, Dongqi, Doya, Kenji, Tani, Jun
In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.
Artificial Intelligence in Surgery
Zhou, Xiao-Yun, Guo, Yao, Shen, Mali, Yang, Guang-Zhong
The Hamlyn Centre for Robotic Surgery, Imperial College London, UK 2. Institute of Medical Robotics, Shanghai Jiao Tong University, ChinaAbstract Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from preoperative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery. Keywords: Artificial intelligence, Surgical autonomy, Medical robotics, Deep learning 1. Introduction Advances in surgery have made a significant impact on the management of both acute and chronic diseases, prolonging life and continuously extending the boundary of survival. These advances are underpinned by continuing technological developments in diagnosis, imaging, and surgical instrumentation. Complex surgical navigation and planning are made possible through the use of both pre-and intra-operative imaging techniques such as ultrasound, Computed Tomography (CT), and Magnetic Resonance Imaging Preprint submitted to Frontiers of Medicine January 6, 2020 arXiv:2001.00627v1 Many terminal illnesses have been transformed into clinically manageable chronic lifelong conditions and increasing surgery is focused on the systematic level impact on patients, avoiding isolated surgical treatment or anatomical alteration, with careful consideration of metabolic, haemodynamic and neurohormonal consequences that can influence the quality of life. For recent advances in medicine, AI has played an important role in clinical decision support since the early years of developing the MYCIN system [5]. AI is now increasingly used for risk stratification, genomics, imaging and diagnosis, precision medicine, and drug discovery. The introduction of AI in surgery is more recent and it has a strong root in imaging and navigation, with early techniques focused on feature detection and computer assisted intervention for both preoperative planning and intra-operative guidance. Over the years, supervised algorithms such as active shape models, atlas based methods and statistical classifiers have been developed [1]. With recent successes of AlexNet [6], deep learning methods, especially Deep Con-volutional Neural Network (DCNN) where multiple convolutional layers are cascaded, have enabled automatically learned data-driven descriptors, rather than ad hoc handcrafted features, to be used for image understanding with improved robustness and generalizability.
A Regression Framework for Predicting User's Next Location using Call Detail Records
Mahdizadeh, Mohammad Saleh, Bahrak, Behnam
With the growth of using cell phones and the increase in diversity of smart mobile devices, a massive volume of data is generated continuously in the process of using these devices. Among these data, Call Detail Records, CDR, is highly remarkable. Since CDR contains both temporal and spatial labels, mobility analysis of CDR is one of the favorite subjects of study among the researchers. The user next location prediction is one of the main problems in the field of human mobility analysis. In this paper, we propose a data processing framework to predict user next location. We propose domain-specific data processing strategies and design a deep neural network model which is based on recurrent neurons and perform regression tasks. Using this prediction framework, the error of the prediction decreases from 74% to 55% in comparison to the worst and best performing traditional models. Methods, strategies, the framework and the results of this paper can be helpful in many applications such as urban planning and digital marketing.
Direct and indirect reinforcement learning
Guan, Yang, Li, Shengbo Eben, Duan, Jingliang, Li, Jie, Ren, Yangang, Cheng, Bo
Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect methods according to how they seek optimal policy of the Markov Decision Process (MDP) problem. The former solves optimal policy by directly maximizing an objective function using gradient descent method, in which the objective function is usually the expectation of accumulative future rewards. The latter indirectly finds the optimal policy by solving the Bellman equation, which is the sufficient and necessary condition from Bellman's principle of optimality. We take vanilla policy gradient and approximate policy iteration to study their internal relationship, and reveal that both direct and indirect methods can be unified in actor-critic architecture and are equivalent if we always choose stationary state distribution of current policy as initial state distribution of MDP. Finally, we classify the current mainstream RL algorithms and compare the differences between other criteria including value-based and policy-based, model-based and model-free.
Can Agents Learn by Analogy? An Inferable Model for PAC Reinforcement Learning
Model-based reinforcement learning algorithms make decisions by building and utilizing a model of the environment. However, none of the existing algorithms attempts to infer the dynamics of any state-action pair from known state-action pairs before meeting it for sufficient times. We propose a new model-based method called Greedy Inference Model (GIM) that infers the unknown dynamics from known dynamics based on the internal spectral properties of the environment. In other words, GIM can "learn by analogy". We further introduce a new exploration strategy which ensures that the agent rapidly and evenly visits unknown state-action pairs. GIM is much more computationally efficient than state-of-the-art model-based algorithms, as the number of dynamic programming operations is independent of the environment size. Lower sample complexity could also be achieved under mild conditions compared against methods without inferring. Experimental results demonstrate the effectiveness and efficiency of GIM in a variety of real-world tasks.
Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip, Borkar, Vivek, Karandikar, Abhay, Chaporkar, Prasanna
Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the curse of dimensionality and the curse of modeling. To overcome these issues, Reinforcement Learning (RL) methods are adopted in practice. In this paper, we aim to obtain the optimal admission control policy in a system where different classes of customers are present. Using DP techniques, we prove that it is optimal to admit the $i$ th class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$. Contrary to traditional RL algorithms which do not take into account the structural properties of the optimal policy while learning, we propose a structure-aware learning algorithm which exploits the threshold structure of the optimal policy. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the structure-aware learning algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results also establish the gain in the convergence rate of the proposed algorithm over other RL algorithms. The techniques presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and communication networking.
Optimal Best Markovian Arm Identification with Fixed Confidence
We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models. We derive instance specific nonasymptotic and asymptotic lower bounds which generalize those of the IID setting. We analyze the Track-and-Stop strategy, initially proposed for the IID setting, and we prove that asymptotically it is at most a factor of four apart from the lower bound. Our one-parameter Markovian bandit model is based on the notion of an exponential family of stochastic matrices for which we establish many useful properties. For the analysis of the Track-and-Stop strategy we derive a novel concentration inequality for Markov chains that may be of interest in its own right.
Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning
Li, Sheng, Egorov, Maxim, Kochenderfer, Mykel
New methodologies will be needed to ensure the airspace remains safe and efficient as traffic densities rise to accommodate new unmanned operations. This paper explores how unmanned free-flight traffic may operate in dense airspace. We develop and analyze autonomous collision avoidance systems for aircraft operating in dense airspace where traditional collision avoidance systems fail. We propose a metric for quantifying the decision burden on a collision avoidance system as well as a metric for measuring the impact of the collision avoidance system on airspace. We use deep reinforcement learning to compute corrections for an existing collision avoidance approach to account for dense airspace. The results show that a corrected collision avoidance system can operate more efficiently than traditional methods in dense airspace while maintaining high levels of safety.
An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models
Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and mean parameter of the probability distribution of the responses (for a given state) depend on the set of $p$ covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with $4^p$ potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest.