Markov Models
Aligning an optical interferometer with beam divergence control and continuous action space
Makarenko, Stepan, Sorokin, Dmitry, Ulanov, Alexander, Lvovsky, A. I.
Reinforcement learning is finding its way to real-world problem application, transferring from simulated environments to physical setups. In this work, we implement vision-based alignment of an optical Mach-Zehnder interferometer with a confocal telescope in one arm, which controls the diameter and divergence of the corresponding beam. We use a continuous action space; exponential scaling enables us to handle actions within a range of over two orders of magnitude. Our agent trains only in a simulated environment with domain randomizations. In an experimental evaluation, the agent significantly outperforms an existing solution and a human expert.
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Text Worlds are virtual environments for embodied agents that, unlike 2D or 3D environments, are rendered exclusively using textual descriptions. These environments offer an alternative to higher-fidelity 3D environments due to their low barrier to entry, providing the ability to study semantics, compositional inference, and other high-level tasks with rich high-level action spaces while controlling for perceptual input. This systematic survey outlines recent developments in tooling, environments, and agent modeling for Text Worlds, while examining recent trends in knowledge graphs, common sense reasoning, transfer learning of Text World performance to higher-fidelity environments, as well as near-term development targets that, once achieved, make Text Worlds an attractive general research paradigm for natural language processing.
AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning
Huang, Biwei, Feng, Fan, Lu, Chaochao, Magliacane, Sara, Zhang, Kun
Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we construct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.
Counterfactual Explanations in Sequential Decision Making Under Uncertainty
Tsirtsis, Stratis, De, Abir, Gomez-Rodriguez, Manuel
Methods to find counterfactual explanations have predominantly focused on one step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.
Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification
Gotovos, Alkis, Burkholz, Rebekka, Quackenbush, John, Jegelka, Stefanie
Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review
Li, Irene, Pan, Jessica, Goldwasser, Jeremy, Verma, Neha, Wong, Wai Pan, Nuzumlalı, Muhammed Yavuz, Rosand, Benjamin, Li, Yixin, Zhang, Matthew, Chang, David, Taylor, R. Andrew, Krumholz, Harlan M., Radev, Dragomir
Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenotyping, knowledge graphs, medical dialogue, multilinguality, interpretability, etc.
Antithetic Riemannian Manifold And Quantum-Inspired Hamiltonian Monte Carlo
Mongwe, Wilson Tsakane, Mbuvha, Rendani, Marwala, Tshilidzi
Markov Chain Monte Carlo inference of target posterior distributions in machine learning is predominately conducted via Hamiltonian Monte Carlo and its variants. This is due to Hamiltonian Monte Carlo based samplers ability to suppress random-walk behaviour. As with other Markov Chain Monte Carlo methods, Hamiltonian Monte Carlo produces auto-correlated samples which results in high variance in the estimators, and low effective sample size rates in the generated samples. Adding antithetic sampling to Hamiltonian Monte Carlo has been previously shown to produce higher effective sample rates compared to vanilla Hamiltonian Monte Carlo. In this paper, we present new algorithms which are antithetic versions of Riemannian Manifold Hamiltonian Monte Carlo and Quantum-Inspired Hamiltonian Monte Carlo. The Riemannian Manifold Hamiltonian Monte Carlo algorithm improves on Hamiltonian Monte Carlo by taking into account the local geometry of the target, which is beneficial for target densities that may exhibit strong correlations in the parameters. Quantum-Inspired Hamiltonian Monte Carlo is based on quantum particles that can have random mass. Quantum-Inspired Hamiltonian Monte Carlo uses a random mass matrix which results in better sampling than Hamiltonian Monte Carlo on spiky and multi-modal distributions such as jump diffusion processes. The analysis is performed on jump diffusion process using real world financial market data, as well as on real world benchmark classification tasks using Bayesian logistic regression.
Multi-Modal Mutual Information (MuMMI) Training for Robust Self-Supervised Deep Reinforcement Learning
Chen, Kaiqi, Lee, Yong, Soh, Harold
This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors. We find that current methods do not sufficiently encourage a shared representation between modalities; this can cause poor performance on downstream tasks and over-reliance on specific sensors. As a solution, we contribute a new multi-modal deep latent state-space model, trained using a mutual information lower-bound. The key innovation is a specially-designed density ratio estimator that encourages consistency between the latent codes of each modality. We tasked our method to learn policies (in a self-supervised manner) on multi-modal Natural MuJoCo benchmarks and a challenging Table Wiping task. Experiments show our method significantly outperforms state-of-the-art deep reinforcement learning methods, particularly in the presence of missing observations.
Pedestrian Emergence Estimation and Occlusion-Aware Risk Assessment for Urban Autonomous Driving
Koc, Mert, Yurtsever, Ekim, Redmill, Keith, Ozguner, Umit
Avoiding unseen or partially occluded vulnerable road users (VRUs) is a major challenge for fully autonomous driving in urban scenes. However, occlusion-aware risk assessment systems have not been widely studied. Here, we propose a pedestrian emergence estimation and occlusion-aware risk assessment system for urban autonomous driving. First, the proposed system utilizes available contextual information, such as visible cars and pedestrians, to estimate pedestrian emergence probabilities in occluded regions. These probabilities are then used in a risk assessment framework, and incorporated into a longitudinal motion controller. The proposed controller is tested against several baseline controllers that recapitulate some commonly observed driving styles. The simulated test scenarios include randomly placed parked cars and pedestrians, most of whom are occluded from the ego vehicle's view and emerges randomly. The proposed controller outperformed the baselines in terms of safety and comfort measures.
Low-Dimensional State and Action Representation Learning with MDP Homomorphism Metrics
Botteghi, Nicolò, Poel, Mannes, Sirmacek, Beril, Brune, Christoph
In the last decade, Deep Reinforcement Learning [1] algorithms have solved increasingly complicated problems in many different domains, spanning from video games [2] to numerous robotics applications [3], in an end-to-end fashion. Despite the success of end-to-end Reinforcement Learning, these methods suffer from low sample efficiency and usually requires lengthy and expensive training procedures to learn optimal behaviours. This problem is even more emphasized when the true state of the environment is not observable, and the observation space O or the action space A are high-dimensional. In end-to-end settings, due to the weak supervision of the reward signal, Reinforcement Learning algorithms are not enforced to learn good state representations of the environment, making the mapping observations to actions challenging to learn and interpret. State representation learning [4] methods aim at reducing the dimensionality of the observation stream by learning a mapping from the observation space O to a lower-dimensional state space S containing only the meaningful feature needed for solving a given task. By employing self-supervised auxiliary losses, it is possible to enforce optimal state representation and learn models of the underlying Markov Decision Process, or MDP. When policies are learned using the abstract or latent state-space variables, the training time is often reduced, the sample-efficiency, the robustness, and generalisation capabilities of the policies grow compared to end-to-end Reinforcement Learning [5], [6] and [7]. While the problem of state representation and observation compression has been extensively treated [4], only a few works have extended the concept of dimensionality reduction to the action space A. In this category, we find the works done in [8], [9] and [10] where low-dimensional action representations are used to improve training efficiency