Dietterich, Thomas G.


Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning

arXiv.org Machine Learning

Exogenous state variables and rewards can slow down reinforcement learning by injecting uncontrolled variation into the reward signal. We formalize exogenous state variables and rewards and identify conditions under which an MDP with exogenous state can be decomposed into an exogenous Markov Reward Process involving only the exogenous state+reward and an endogenous Markov Decision Process defined with respect to only the endogenous rewards. We also derive a variance-covariance condition under which Monte Carlo policy evaluation on the endogenous MDP is accelerated compared to using the full MDP. Similar speedups are likely to carry over to all RL algorithms. We develop two algorithms for discovering the exogenous variables and test them on several MDPs. Results show that the algorithms are practical and can significantly speed up reinforcement learning.


Efficient Exploration for Constrained MDPs

AAAI Conferences

Given a Markov Decision Process (MDP) defined by a simulator, a designated starting state $s_0$, and a downside risk constraint defined as the probability of reaching catastrophic states, our goal is to find a stationary deterministic policy $\pi$ that with probability $1-\delta$ achieves a value $V^\pi(s_0)$ that is within $\epsilon$ of the value of the optimal stationary deterministic $\nu$-feasible policy, $V^*(s_0)$, while economizing on the number of calls to the simulator. This paper presents the first {\bf PAC-Safe-RL} algorithm for this purpose. The algorithm extends PAC-RL algorithms for efficient exploration while providing guarantees that the downside constraint is satisfied. Experiments comparing our {\sc ConstrainedDDV} algorithm to baselines show substantial reductions in the number of simulator calls required to find a feasible policy.


Factoring Exogenous State for Model-Free Monte Carlo

arXiv.org Machine Learning

Policy analysts wish to visualize a range of policies for large simulator-defined Markov Decision Processes (MDPs). One visualization approach is to invoke the simulator to generate on-policy trajectories and then visualize those trajectories. When the simulator is expensive, this is not practical, and some method is required for generating trajectories for new policies without invoking the simulator. The method of Model-Free Monte Carlo (MFMC) can do this by stitching together state transitions for a new policy based on previously-sampled trajectories from other policies. This "off-policy Monte Carlo simulation" method works well when the state space has low dimension but fails as the dimension grows. This paper describes a method for factoring out some of the state and action variables so that MFMC can work in high-dimensional MDPs. The new method, MFMCi, is evaluated on a very challenging wildfire management MDP.



Incorporating Feedback into Tree-based Anomaly Detection

arXiv.org Machine Learning

Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst's job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.


Learning Greedy Policies for the Easy-First Framework

AAAI Conferences

Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. We formulate greedy policy learning in the Easy-first approach as a novel non-convex optimization problem and solve it via an efficient Majorization Minimizatoin (MM) algorithm. Results on within-document coreference and cross-document joint entity and event coreference tasks demonstrate that the proposed approach achieves statistically significant performance improvement over existing training regimes for Easy-first and is less susceptible to overfitting.


Sequential Feature Explanations for Anomaly Detection

arXiv.org Machine Learning

In many applications, an anomaly detection system presents the most anomalous data instance to a human analyst, who then must determine whether the instance is truly of interest (e.g. a threat in a security setting). Unfortunately, most anomaly detectors provide no explanation about why an instance was considered anomalous, leaving the analyst with no guidance about where to begin the investigation. To address this issue, we study the problems of computing and evaluating sequential feature explanations (SFEs) for anomaly detectors. An SFE of an anomaly is a sequence of features, which are presented to the analyst one at a time (in order) until the information contained in the highlighted features is enough for the analyst to make a confident judgement about the anomaly. Since analyst effort is related to the amount of information that they consider in an investigation, an explanation's quality is related to the number of features that must be revealed to attain confidence. One of our main contributions is to present a novel framework for large scale quantitative evaluations of SFEs, where the quality measure is based on analyst effort. To do this we construct anomaly detection benchmarks from real data sets along with artificial experts that can be simulated for evaluation. Our second contribution is to evaluate several novel explanation approaches within the framework and on traditional anomaly detection benchmarks, offering several insights into the approaches.


Learning Scripts as Hidden Markov Models

AAAI Conferences

Scripts have been proposed to model the stereotypical event sequences found in narratives. They can be applied to make a variety of inferences including fillinggaps in the narratives and resolving ambiguous references. This paper proposes the first formal frameworkfor scripts based on Hidden Markov Models (HMMs). Our framework supports robust inference and learning algorithms, which are lacking in previous clustering models. We develop an algorithm for structure andparameter learning based on Expectation Maximizationand evaluate it on a number of natural datasets. The results show that our algorithm is superior to several informed baselines for predicting missing events in partialobservation sequences.


Reconstructing Velocities of Migrating Birds from Weather Radar – A Case Study in Computational Sustainability

AI Magazine

Bird migration occurs at the largest of global scales, but monitoring such movements can be challenging. In the US there is an operational network of weather radars providing freely accessible data for monitoring meteorological phenomena in the atmosphere. Individual radars are sensitive enough to detect birds, and can provide insight into migratory behaviors of birds at scales that are not possible using other sensors. Archived data from the WSR-88D network of US weather radars hold valuable and detailed information about the continent-scale migratory movements of birds over the last 20 years. However, significant technical challenges must be overcome to understand this information and harness its potential for science and conservation. We describe recent work on an AI system to quantify bird migration using radar data, which is part of the larger BirdCast project to model and forecast bird migration at large scales using radar, weather, and citizen science data.


Gaussian Approximation of Collective Graphical Models

arXiv.org Machine Learning

The Collective Graphical Model (CGM) models a population of independent and identically distributed individuals when only collective statistics (i.e., counts of individuals) are observed. Exact inference in CGMs is intractable, and previous work has explored Markov Chain Monte Carlo (MCMC) and MAP approximations for learning and inference. This paper studies Gaussian approximations to the CGM. As the population grows large, we show that the CGM distribution converges to a multivariate Gaussian distribution (GCGM) that maintains the conditional independence properties of the original CGM. If the observations are exact marginals of the CGM or marginals that are corrupted by Gaussian noise, inference in the GCGM approximation can be computed efficiently in closed form. If the observations follow a different noise model (e.g., Poisson), then expectation propagation provides efficient and accurate approximate inference. The accuracy and speed of GCGM inference is compared to the MCMC and MAP methods on a simulated bird migration problem. The GCGM matches or exceeds the accuracy of the MAP method while being significantly faster.