Markov Models
Applications of PageRank to Function Comparison and Malware Classification
Slawinski, Michael A., Wortman, Andy
We classify .NET files as either benign or malicious by examining certain directed graphs extracted from the files via decompilation. Each graph is viewed probabilistically as a Markov chain where each node heuristically represents the possible state of the running file, and by computing the PageRank vector (Perron vector with transport) we can assign a probability measure over the nodes of the given graph. We train a random forest with features derived from computing Lebesgue antiderivatives of functions defined over the vertex sets of the graphs listed above against the PageRank measure. The model was trained on 2.5 million samples of .NET and has an accuracy of 98.3\% on test data. The median time needed for decompilation and scoring was 24ms.
Deep Neural Network Compression for Aircraft Collision Avoidance Systems
Julian, Kyle D., Kochenderfer, Mykel J., Owen, Michael P.
The resulting collision avoidance strategy can be represented as a numeric table. This methodology has been used in the development of the Airborne Collision Avoidance System X (ACAS X) family of collision avoidance systems for manned and unmanned aircraft, but the high dimensionality of the state space leads to very large tables. To improve storage efficiency, a deep neural network is used to approximate the table. With the use of an asymmetric loss function and a gradient descent algorithm, the parameters for this network can be trained to provide accurate estimates of table values while preserving the relative preferences of the possible advisories for each state. By training multiple networks to represent subtables, the network also decreases the required runtime for computing the collision avoidance advisory. Simulation studies show that the network improves the safety and efficiency of the collision avoidance system. Because only the network parameters need to be stored, the required storage space is reduced by a factor of 1000, enabling the collision avoidance system to operate using current avionics systems.
Discovering General-Purpose Active Learning Strategies
Konyushkova, Ksenia, Sznitman, Raphael, Fua, Pascal
We propose a general-purpose approach to discovering active learning (AL) strategies from data. These strategies are transferable from one domain to another and can be used in conjunction with many machine learning models. To this end, we formalize the annotation process as a Markov decision process, design universal state and action spaces and introduce a new reward function that precisely model the AL objective of minimizing the annotation cost We seek to find an optimal (non-myopic) AL strategy using reinforcement learning. We evaluate the learned strategies on multiple unrelated domains and show that they consistently outperform state-of-the-art baselines. Modern supervised machine learning (ML) methods require large annotated datasets for training purposes and the cost of producing them can easily become prohibitive. Active learning (AL) mitigates the problem by selecting intelligently and adaptively a subset of the data to be annotated. To do so, AL typically relies on informativeness measures that identify unlabelled datapoints whose labels are most likely to help to improve the performance of the trained model. As a result, good performance is achieved using far fewer annotations than by randomly labelling data. Most AL selection strategies are hand-designed either on the basis of researcher's expertise and intuition or by approximating theoretical criteria (Settles, 2012).
Robot Bed-Making: Deep Transfer Learning Using Depth Sensing of Deformable Fabric
Seita, Daniel, Jamali, Nawid, Laskey, Michael, Berenstein, Ron, Tanwani, Ajay Kumar, Baskaran, Prakash, Iba, Soshi, Canny, John, Goldberg, Ken
Abstract-- Bed-making is a common task well-suited for home robots since it is tolerant to error and not time-critical. Bed-making can also be difficult for senior citizens and those with limited mobility due to the bending and reaching movements required. Autonomous bed-making combines multiple challenges in robotics: perception in unstructured environments, deformable object manipulation, transfer learning, and sequential decision making. We formalize the bed-making problem as one of maximizing surface coverage with a blanket, and explore algorithmic approaches that use deep learning on depth images to be invariant to the color and pattern of the blankets. We train two networks: one to identify a corner of the blanket and another to determine when to transition to the other side of the bed. Using the first network, the robot grasps at its estimate of the blanket corner and then pulls it to the appropriate corner of the bed frame. The second network estimates if the robot has sufficiently covered one side and can transition to the other, or if it should attempt another grasp from the same side. We evaluate with two robots, the Toyota HSR and the Fetch, and three blankets. Using 2018 and 654 depth images for training the grasp and transition networks respectively, experiments with a quarter-scale twin bed achieve an average of 91.7% blanket coverage, nearly matching human supervisors with 95.0% coverage. Data is available at https: //sites.google.com/view/bed-make. A common home task is bed-making [4], which is rarely enjoyed and can be physically challenging due to bending and leaning movements. Surveys of older adults in the United States [9], [3], suggest that they are willing to have a robot assistant in their homes, particularly for physically demanding tasks.
Distributed Wildfire Surveillance with Autonomous Aircraft using Deep Reinforcement Learning
Julian, Kyle D., Kochenderfer, Mykel J.
Teams of autonomous unmanned aircraft can be used to monitor wildfires, enabling firefighters to make informed decisions. However, controlling multiple autonomous fixed-wing aircraft to maximize forest fire coverage is a complex problem. The state space is high dimensional, the fire propagates stochastically, the sensor information is imperfect, and the aircraft must coordinate with each other to accomplish their mission. This work presents two deep reinforcement learning approaches for training decentralized controllers that accommodate the high dimensionality and uncertainty inherent in the problem. The first approach controls the aircraft using immediate observations of the individual aircraft. The second approach allows aircraft to collaborate on a map of the wildfire's state and maintain a time history of locations visited, which are used as inputs to the controller. Simulation results show that both approaches allow the aircraft to accurately track wildfire expansions and outperform an online receding horizon controller. Additional simulations demonstrate that the approach scales with different numbers of aircraft and generalizes to different wildfire shapes.
Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart City Services
Mohammadi, Mehdi, Al-Fuqaha, Ala, Guizani, Mohsen, Oh, Jun-Seok
Abstract--Smart services are an important element of the smart cities and the Internet of Things (IoT) ecosystems where the intelligence behind the services is obtained and improved through the sensory data. Providing a large amount of training data is not always feasible; therefore, we need to consider alternative ways that incorporate unlabeled data as well. In recent years, Deep reinforcement learning (DRL) has gained great success in several application domains. It is an applicable method for IoT and smart city scenarios where auto-generated data can be partially labeled by users' feedback for training purposes. In this paper, we propose a semi-supervised deep reinforcement learning model that fits smart city applications as it consumes both labeled and unlabeled data to improve the performance and accuracy of the learning agent. To the best of our knowledge, the proposed model is the first investigation that extends deep reinforcement learning to the semi-supervised paradigm. As a case study of smart city applications, we focus on smart buildings and apply the proposed model to the problem of indoor localization based on BLE signal strength. Indoor localization is the main component of smart city services since people spend significant time in indoor environments. Our model learns the best action policies that lead to a close estimation of the target locations with an improvement of 23% in terms of distance to the target and at least 67% more received rewards compared to the supervised DRL model. The rapid development of Internet of Things (IoT) technologies motivated researchers and developers to think about new kinds of smart services that extract knowledge from IoT generated data. The scarcity of labeled data is a main issue for developing such solutions especially for IoT applications where a large number of sensors participate in generating data without being able to obtain class labels corresponding to the collected data. This publication was made possible by NPRP grant# [71113-1-199] from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Robot Representing and Reasoning with Knowledge from Reinforcement Learning
Lu, Keting, Zhang, Shiqi, Stone, Peter, Chen, Xiaoping
Reinforcement learning (RL) agents aim at learning by interacting with an environment, and are not designed for representing or reasoning with declarative knowledge. Knowledge representation and reasoning (KRR) paradigms are strong in declarative KRR tasks, but are ill-equipped to learn from such experiences. In this work, we integrate logical-probabilistic KRR with model-based RL, enabling agents to simultaneously reason with declarative knowledge and learn from interaction experiences. The knowledge from humans and RL is unified and used for dynamically computing task-specific planning models under potentially new environments. Experiments were conducted using a mobile robot working on dialog, navigation, and delivery tasks. Results show significant improvements, in comparison to existing model-based RL methods.
World-class PyTorch support on Azure
Today we are excited to strengthen our commitment to supporting PyTorch as a first-class framework on Azure, with exciting new capabilities in our Azure Machine Learning public preview refresh. In addition, our PyTorch support extends deeply across many of our AI Platform services and tooling, which we will highlight below. During the past two years since PyTorch's first release in October 2016, we've witnessed the rapid and organic adoption of the deep learning framework among academia, industry, and the AI community at large. While PyTorch's Python-first integration and imperative style have long made the framework a hit among researchers, the latest PyTorch 1.0 release brings the production-level readiness and scalability needed to make it a true end-to-end deep learning platform, from prototyping to production. Azure Machine Learning (Azure ML) service is a cloud-based service that enables data scientists to carry out end-to-end machine learning workflows, from data preparation and training to model management and deployment.
Discretizing Logged Interaction Data Biases Learning for Decision-Making
Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead.
Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes
Lee, Gilwoo, Choudhury, Sanjiban, Hou, Brian, Srinivasa, Siddhartha S.
We present the first PAC optimal algorithm for Bayes-Adaptive Markov Decision Processes (BAMDPs) in continuous state and action spaces, to the best of our knowledge. The BAMDP framework elegantly addresses model uncertainty by incorporating Bayesian belief updates into long-term expected return. However, computing an exact optimal Bayesian policy is intractable. Our key insight is to compute a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function. We prove the near-optimality of our algorithm and analyze a number of schemes that boost the algorithm's efficiency. Finally, we empirically validate our approach on a number of discrete and continuous BAMDPs and show that the learned policy has consistently competitive performance against baseline approaches.