Oregon State University
Avoiding Negative Side Effects of Autonomous Systems in the Open World
Saisubramanian, Sandhya (Oregon State University) | Kamar, Ece (Microsoft Research) | Zilberstein, Shlomo (University of Massachusetts Amherst)
Autonomous systems that operate in the open world often use incomplete models of their environment. Model incompleteness is inevitable due to the practical limitations in precise model specification and data collection about open-world environments. Due to the limited fidelity of the model, agent actions may produce negative side effects (NSEs) when deployed. Negative side effects are undesirable, unmodeled effects of agent actions on the environment. NSEs are inherently challenging to identify at design time and may affect the reliability, usability and safety of the system. We present two complementary approaches to mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. The solution approaches target settings with different assumptions and agent responsibilities. In learning from feedback, the agent learns a penalty function associated with a NSE. We investigate the efficiency of different feedback mechanisms, including human feedback and autonomous exploration. The problem is formulated as a multi-objective Markov decision process such that optimizing the agentโs assigned task is prioritized over mitigating NSE. A slack parameter denotes the maximum allowed deviation from the optimal expected reward for the agentโs task in order to mitigate NSE. In environment shaping, we examine how a human can assist an agent, beyond providing feedback, and utilize their broader scope of knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task and may produce NSE during its operation. The human assists the agent by performing modest reconfigurations of the environment so as to mitigate the impacts of NSE, without affecting the agentโs ability to complete its assigned task. We present an algorithm for shaping and analyze its properties. Empirical evaluations demonstrate the trade-offs in the performance of different approaches in mitigating NSE in different settings.
Monte-Carlo Planning for Agile Legged Locomotion
Clary, Patrick (Oregon State University) | Morais, Pedro (Oregon State University) | Fern, Alan (Oregon State University) | Hurst, Jonathan (Oregon State University)
Recent progress in legged locomotion research has produced robots that can perform agile blind-walking with robustness comparable to a blindfolded human. However, this walking approach has not yet been integrated with planners for high-level activities. In this paper, we take a step towards high-level task planning for these robots by studying a planar simulated biped that captures their essential dynamics. We investigate variants of Monte-Carlo Tree Search (MCTS) for selecting an appropriate blind-walking controller at each decision cycle. In particular, we consider UCT with an intelligently selected rollout policy, which is shown to be capable of guiding the biped through treacherous terrain. In addition, we develop a new MCTS variant, called Monte-Carlo Discrepancy Search (MCDS), which is shown to make more effective use of limited planning time than UCT for this domain. We demonstrate the effectiveness of these planners in both deterministic and stochastic environments across a range of algorithm parameters. In addition, we present results for using these planners to control a full-order 3D simulation of Cassie, an agile bipedal robot, through complex terrain.
Training Deep Reactive Policies for Probabilistic Planning Problems
Issakkimuthu, Murugeswari (Oregon State University) | Fern, Alan (Oregon State University) | Tadepalli, Prasad (Oregon State University)
State-of-the-art probabilistic planners typically apply look- ahead search and reasoning at each step to make a decision. While this approach can enable high-quality decisions, it can be computationally expensive for problems that require fast decision making. In this paper, we investigate the potential for deep learning to replace search by fast reactive policies. We focus on supervised learning of deep reactive policies for probabilistic planning problems described in RDDL. A key challenge is to explore the large design space of network architectures and training methods, which was critical to prior deep learning successes. We investigate a number of choices in this space and conduct experiments across a set of benchmark problems. Our results show that effective deep reactive policies can be learned for many benchmark problems and that leveraging the planning problem description to define the network structure can be beneficial.
Detecting Cyberattack Entities from Audit Data via Multi-View Anomaly Detection with Feedback
Siddiqui, Md Amran (Oregon State University) | Fern, Alan (Oregon State University) | Wright, Ryan (Galois, Inc.) | Theriault, Alec (Galois, Inc.) | Archer, David (Galois, Inc.) | Maxwell, William (Galois, Inc.)
In this paper, we consider the problem of detecting unknown cyberattacks from audit data of system-level events. A key challenge is that different cyberattacks will have different suspicion indicators, which are not known beforehand. To address this we consider a multi-view anomaly detection framework, where multiple expert-designed ``views" of the data are created for capturing features that may serve as potential indicators. Anomaly detectors are then applied to each view and the results are combined to yield an overall suspiciousness ranking of system entities. Unfortunately, there is often a mismatch between what anomaly detection algorithms find and what is actually malicious, which can result in many false positives. This problem is made even worse in the multi-view setting, where only a small subset of the views may be relevant to detecting a particular cyberattack. To help reduce the false positive rate, a key contribution of this paper is to incorporate feedback from security analysts about whether proposed suspicious entities are of interest or likely benign. This feedback is incorporated into subsequent anomaly detection in order to improve the suspiciousness ranking toward entities that are truly of interest to the analyst. For this purpose, we propose an easy to implement variant of the perceptron learning algorithm, which is shown to be quite effective on benchmark datasets. We evaluate our overall approach on real attack data from a DARPA red team exercise, which include multiple attacks on multiple operating systems. The results show that the incorporation of feedback can significantly reduce the time required to identify malicious system entities.
Can We Achieve Open Category Detection with Guarantees?
Liu, Si (Oregon State University) | Garrepalli, Risheek (Oregon State University) | Fern, Alan (Oregon State University) | Dietterich, Thomas G. (Oregon State University)
Open category detection is the problem of detecting "alien" test instances that belong to categories/classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring safety and/or quality of test data analysis. Unfortunately, to the best of our knowledge, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien-detection rates. Thus, there are ย significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simplified, but practically relevant, variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest. However, at test time, some fraction \alpha of the test examples are aliens. Under the assumption that we know an upper bound on \alpha, we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Our empirical results on synthetic and benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.
Efficient Exploration for Constrained MDPs
Taleghan, Majid Alkaee (Oregon State University) | Dietterich, Thomas G. (Oregon State University)
Given a Markov Decision Process (MDP) defined by a simulator, a designated starting state $s_0$, and a downside risk constraint defined as the probability of reaching catastrophic states, our goal is to find a stationary deterministic policy $\pi$ that with probability $1-\delta$ achieves a value $V^\pi(s_0)$ that is within $\epsilon$ of the value of the optimal stationary deterministic $\nu$-feasible policy, $V^*(s_0)$, while economizing on the number of calls to the simulator. This paper presents the first {\bf PAC-Safe-RL} algorithm for this purpose. The algorithm extends PAC-RL algorithms for efficient exploration while providing guarantees that the downside constraint is satisfied. Experiments comparing our {\sc ConstrainedDDV} algorithm to baselines show substantial reductions in the number of simulator calls required to find a feasible policy.
On Convergence of Epanechnikov Mean Shift
Huang, Kejun (University of Minnesota) | Fu, Xiao (Oregon State University) | Sidiropoulos, Nicholas D. (University of Virginia)
Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the "optimal" Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions,the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a inite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlapping spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets.ย Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm.
Comparing Reward Shaping, Visual Hints, and Curriculum Learning
Pocius, Rey (Oregon State University) | Isele, David (University of Pennsylvania) | Roberts, Mark (United States Naval Research Laboratory) | Aha, David W. (United States Naval Research Laboratory )
Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum. Yet few studies examine how they compare to each other, when one might prefer one approach, or how they may complement each other. As a first step in this direction, we compare reward shaping, hints, and curricula for a Deep RL agent in the game of Minecraft. We seek to answer whether reward shaping, visual hints, or the curricula have the most impact on performance, which we measure as the time to reach the target, the distance from the target, the cumulative reward, or the number of actions taken. Our analyses show that performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, the results suggest that designing an effective curriculum and providing appropriate hints most improve the performance. Common approaches to learn complex tasks in reinforcement learning include reward shaping, environmental hints, or a curriculum, yet few studies examine how they compare to each other. We compare these approaches for a Deep RL agent in the game of Minecraft and show performance is most impacted by the curriculum used and visual hints; shaping had less impact. For similar navigation tasks, this suggests that designing an effective curriculum with hints most improve the performance.
Multi-Task Medical Concept Normalization Using Multi-View Convolutional Neural Network
Luo, Yi (University of California, San Diego) | Song, Guojie (Peking University) | Li, Pengyu (Peking University) | Qi, Zhongang (Oregon State University)
Medical concept normalization is a critical problem in biomedical research and clinical applications. In this paper, we focus on normalizing diagnostic and procedure names in Chinese discharge summaries to standard entities, which is formulated as a semantic matching problem. However, non-standard Chinese expressions, short-text normalization and heterogeneity of tasks pose critical challenges in our problem. This paper presents a general framework which introduces a tensor generator and a novel multi-view convolutional neural network (CNN) with multi-task shared structure to tackle the two tasks simultaneously. We propose that the key to address non-standard expressions and short-text problem is to incorporate a matching tensor with multiple granularities. Then multi-view CNN is adopted to extract semantic matching patterns and learn to synthesize them from different views. Finally, multi-task shared structure allows the model to exploit medical correlations between disease and procedure names to better perform disambiguation tasks. Comprehensive experimental analysis indicates our model outperforms existing baselines which demonstrates the effectiveness of our model.
Predicting Links in Plant-Pollinator Interaction Networks Using Latent Factor Models With Implicit Feedback
Seo, Eugene (Oregon State University) | Hutchinson, Rebecca A. (Oregon State University)
Plant-pollinator interaction networks are bipartite networks representing the mutualistic interactions between a set of plant species and a set of pollinator species. Data on these networks are collected by field biologists, who count visits from pollinators to flowers. Ecologists study the structure and function of these networks for scientific, conservation, and agricultural purposes. However, little research has been done to understand the underlying mechanisms that determine pairwise interactions or to predict new links from networks describing the species community. This paper explores the use of latent factor models to predict interactions that will occur in new contexts (e.g. a different distribution of the set of plant species) based on an observed network. The analysis draws on algorithms and evaluation strategies developed for recommendation systems and introduces them to this new domain. The matrix factorization methods compare favorably against several baselines on a pollination dataset collected in montane meadows over several years. Incorporating both positive and negative implicit feedback into the matrix factorization methods is particularly promising.