Goto

Collaborating Authors

 Reinforcement Learning


fabiopardo/tonic

#artificialintelligence

Welcome to the Tonic deep reinforcement learning library. Modularity: Building blocks for creating RL agents, such as models, replays, or exploration strategies, are implemented as configurable modules. Readability: Agents are written in a simple way with an identical API and logs are nicely displayed on the terminal with a progress bar. Fair comparison: The training pipeline is unique and compatible with all Tonic agents and environments. Agents are defined by their core ideas while general tricks/improvements like non-terminal timeouts, observation normalization and action scaling are shared.


Online Multi-modal Person Search in Videos

arXiv.org Artificial Intelligence

The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing. Most existing approaches are devised to work in an offline manner, where identities can only be inferred after an entire video is examined. This working manner precludes such methods from being applied to online services or those applications that require real-time responses. In this paper, we propose an online person search framework, which can recognize people in a video on the fly. This framework maintains a multimodal memory bank at its heart as the basis for person recognition, and updates it dynamically with a policy obtained by reinforcement learning. Our experiments on a large movie dataset show that the proposed method is effective, not only achieving remarkable improvements over online schemes but also outperforming offline methods.


Hierarchial Reinforcement Learning in StarCraft II with Human Expertise in Subgoals Selection

arXiv.org Artificial Intelligence

This work is inspired by recent advances in hierarchical reinforcement learning (HRL) (Barto and Mahadevan 2003;Hengst 2010), and improvements in learning efficiency with heuristic-based subgoal selection and hindsight experience replay (HER)(Andrychowicz et al. 2017; Levy et al. 2019). We propose a new method to integrate HRL, HER and effective subgoal selection based on human expertise to support sample-efficient learning and enhance interpretability of the agent's behavior. Human expertise remains indispensable in many areas such as medicine (Buch, Ahmed, and Maruthappu 2018) and law (Cath 2018), where interpretability, explainability and transparency are crucial in the decision making process, for ethical and legal reasons. Our method simplifies the complex task sets for achieving the overall objectives by decomposing into subgoals at different levels of abstraction. Incorporating relevant subjective knowledge also significantly reduces the computational resources spent in exploration for RL, especially in high speed, changing, and complex environments where the transition dynamics cannot be effectively learned and modelled in a short time. Experimental results in two StarCraft II (SC2) minigames demonstrate that our method can achieve better sample efficiency than flat and end-to-end RL methods, and provide an effective method for explaining the agent's performance.


One for Many: Transfer Learning for Building HVAC Control

arXiv.org Artificial Intelligence

The design of building heating, ventilation, and air conditioning (HVAC) system is critically important, as it accounts for around half of building energy consumption and directly affects occupant comfort, productivity, and health. Traditional HVAC control methods are typically based on creating explicit physical models for building thermal dynamics, which often require significant effort to develop and are difficult to achieve sufficient accuracy and efficiency for runtime building control and scalability for field implementations. Recently, deep reinforcement learning (DRL) has emerged as a promising data-driven method that provides good control performance without analyzing physical models at runtime. However, a major challenge to DRL (and many other data-driven learning methods) is the long training time it takes to reach the desired performance. In this work, we present a novel transfer learning based approach to overcome this challenge. Our approach can effectively transfer a DRL-based HVAC controller trained for the source building to a controller for the target building with minimal effort and improved performance, by decomposing the design of neural network controller into a transferable front-end network that captures building-agnostic behavior and a back-end network that can be efficiently trained for each specific building. We conducted experiments on a variety of transfer scenarios between buildings with different sizes, numbers of thermal zones, materials and layouts, air conditioner types, and ambient weather conditions. The experimental results demonstrated the effectiveness of our approach in significantly reducing the training time, energy cost, and temperature violations.


Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

arXiv.org Machine Learning

Imitation learning (IL, Schaal, 1999; Osa et al., 2018) and inverse reinforcement learning (IRL, Ng and Russell, 2000) are two related areas of research that aim to teach agents by providing demonstrations of the desired behavior. Whereas imitation learning aims to learn a policy that results in a similar behavior, inverse reinforcement learning focuses on inferring a reward function that might have been optimized by the demonstrator, aiming to better generalize to different environments. Both areas of research are often formalized as distribution-matching, that is, the learned policy (or the optimal policy for IRL) should induce a distribution over states and actions that is close to the expert's distribution with respect to a given (usually non-metric) distance. Commonly applied distances are the forward Kullback-Leibler (KL) divergence (e.g., Ziebart, 2010), which maximizes the likelihood of the demonstrated state-action pairs under the agent's distribution, and the reverse Kullback-Leibler (RKL) divergence (e.g., Arenz et al., 2016; Fu et al., 2018; Ghasemipour et al., 2020) which minimizes the expected discrimination information (Kullback and Leibler, 1951) of state-action pairs sampled from the agent's distribution. However, since the emergence of generative adversarial networks (GANs, Goodfellow et al., 2014) as a solution technique for both areas, other divergences have been investigated such as the Jensen-Shannon divergence (Ho and Ermon, 2016), the Wasserstein distance (Xiao et al., 2019) and general f-divergences (Ke et al., 2019; Ghasemipour et al., 2020).


Artificial Intelligence Masterclass

#artificialintelligence

Online Courses Udemy - Artificial Intelligence Masterclass, Enter the new era of Hybrid AI Models optimized by Deep NeuroEvolution, with a complete toolkit of ML, DL & AI models Created by Hadelin de Ponteves, Kirill Eremenko, SuperDataScience Team English, Italian [Auto] Students also bought Deep Reinforcement Learning 2.0 Cutting-Edge AI: Deep Reinforcement Learning in Python Artificial Intelligence for Business Deep Learning: Advanced Computer Vision (GANs, SSD, More!) Deep Learning: Convolutional Neural Networks in Python TensorFlow 2.0 Practical Advanced Preview this course GET COUPON CODE Description Today, we are bringing you the king of our AI courses...: The Artificial Intelligence MASTERCLASS Are you keen on Artificial Intelligence? Do want to learn to build the most powerful AI model developed so far and even play against it? Sounds tempting right... Then Artificial Intelligence Masterclass course is the right choice for you. This ultimate AI toolbox is all you need to nail it down with ease. You will get 10 hours step by step guide and the full roadmap which will help you build your own Hybrid AI Model from scratch.


Markov Decision Process

#artificialintelligence

A machine learning algorithm may be tasked with an optimization problem. Using reinforcement learning, the algorithm will attempt to optimize the actions taken within an environment, in order to maximize the potential reward. Where supervised learning techniques require correct input/output pairs to create a model, reinforcement learning uses Markov decision processes to determine an optimal balance of exploration and exploitation. Machine learning may use reinforcement learning by way of the Markov decision process when the probabilities and rewards of an outcome are unspecified or unknown.



Managing caching strategies for stream reasoning with reinforcement learning

arXiv.org Artificial Intelligence

Efficient decision-making over continuously changing data is essential for many application domains such as cyber-physical systems, industry digitalization, etc. Modern stream reasoning frameworks allow one to model and solve various real-world problems using incremental and continuous evaluation of programs as new data arrives in the stream. Applied techniques use, e.g., Datalog-like materialization or truth maintenance algorithms to avoid costly re-computations, thus ensuring low latency and high throughput of a stream reasoner. However, the expressiveness of existing approaches is quite limited and, e.g., they cannot be used to encode problems with constraints, which often appear in practice. In this paper, we suggest a novel approach that uses the Conflict-Driven Constraint Learning (CDCL) to efficiently update legacy solutions by using intelligent management of learned constraints. In particular, we study the applicability of reinforcement learning to continuously assess the utility of learned constraints computed in previous invocations of the solving algorithm for the current one. Evaluations conducted on real-world reconfiguration problems show that providing a CDCL algorithm with relevant learned constraints from previous iterations results in significant performance improvements of the algorithm in stream reasoning scenarios.


SafePILCO: a software tool for safe and data-efficient policy synthesis

arXiv.org Machine Learning

SafePILCO is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known PILCO algorithm, originally written in MATLAB, to support safe learning. We provide a Python implementation and leverage existing libraries that allow the codebase to remain short and modular, which is appropriate for wider use by the verification, reinforcement learning, and control communities.