AITopics

In recent years deep neural networks have been successfully applied to the domains of reinforcement learning \cite{bengio2009learning,krizhevsky2012imagenet,hinton2006reducing}. Deep reinforcement learning \cite{mnih2015human} is reported to have the advantage of learning effective policies directly from high-dimensional sensory inputs over traditional agents. However, within the scope of the literature, there is no fundamental change or improvement on the existing training framework. Here we propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning. We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences and synchronously update the network parameters instead of experience replay. This training framework proves to optimize the unbiased approximation of loss function whose estimation exactly matches the real probability distribution data inputs follow, and thus have overwhelming advantages of sample efficiency and convergence rate over existing deep reinforcement learning after evaluating it on both discrete action spaces and continuous control problems. Besides, we propose several algorithms embedded with our new framework to deal with typical discrete and continuous scenarios. These algorithms prove to be far more efficient than their original versions under the framework of deep reinforcement learning, and provide examples for existing and future algorithms to generalize to our new framework.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2005.07782

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Sancheti, Abhilasha, Krishna, Kundan, Srinivasan, Balaji Vasan, Natarajan, Anandhavelu

Reinforced Rewards Framework for Text Style Transfer

Style transfer deals with the algorithms to transfer the stylistic properties of a piece of text into that of another while ensuring that the core content is preserved. There has been a lot of interest in the field of text style transfer due to its wide application to tailored text generation. Existing works evaluate the style transfer models based on content preservation and transfer strength. In this work, we propose a reinforcement learning based framework that directly rewards the framework on these target metrics yielding a better transfer of the target style. We show the improved performance of our proposed framework based on automatic and human evaluation on three independent tasks: wherein we transfer the style of text from formal to informal, high excitement to low excitement, modern English to Shakespearean English, and vice-versa in all the three cases. Improved performance of the proposed framework over existing state-of-the-art frameworks indicates the viability of the approach.

machine learning, natural language, reinforcement learning, (15 more...)

2005.05256

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

Satsangi, Yash, Lim, Sungsu, Whiteson, Shimon, Oliehoek, Frans, White, Martha

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards---namely visual attention, question answering systems, and intrinsic motivation---and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2005.04912

Country:

North America > Canada > Alberta (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Delay-Aware Model-Based Reinforcement Learning for Continuous Control

Chen, Baiming, Xu, Mengdi, Li, Liang, Zhao, Ding

Action delays degrade the performance of reinforcement learning in many real-world systems. This paper proposes a formal definition of delay-aware Markov Decision Process and proves it can be transformed into standard MDP with augmented states using the Markov reward process. We develop a delay-aware model-based reinforcement learning framework that can incorporate the multi-step delay into the learned system models without learning effort. Experiments with the Gym and MuJoCo platforms show that the proposed delay-aware model-based algorithm is more efficient in training and transferable between systems with various durations of delay compared with off-policy model-free reinforcement learning methods. Codes available at: https://github.com/baimingc/dambrl.

action delay, machine learning, reinforcement learning, (13 more...)

2005.0544

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.68)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Luo, Ying-Sheng, Soeseno, Jonathan Hans, Chen, Trista Pei-Chun, Chen, Wei-Chao

CARL: Controllable Agent with Reinforcement Learning for Quadruped Locomotion

arXiv.org Machine LearningMay-10-2020

Motion synthesis in a dynamic environment has been a long-standing problem for character animation. Methods using motion capture data tend to scale poorly in complex environments because of their larger capturing and labeling requirement. Physics-based controllers are effective in this regard, albeit less controllable. In this paper, we present CARL, a quadruped agent that can be controlled with high-level directives and react naturally to dynamic environments. Starting with an agent that can imitate individual animation clips, we use Generative Adversarial Networks to adapt high-level controls, such as speed and heading, to action distributions that correspond to the original animations. Further fine-tuning through the deep reinforcement learning enables the agent to recover from unseen external perturbations while producing smooth transitions. It then becomes straightforward to create autonomous agents in dynamic environments by adding navigation modules over the entire process. We evaluate our approach by measuring the agent's ability to follow user control and provide a visual analysis of the generated motion to show its effectiveness.

controller, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3386569.3392433

2005.03288

Country: Asia > Taiwan (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Lukianykhin, Oleh, Bogodorova, Tetiana

Reinforcement Learning for Thermostatically Controlled Loads Control using Modelica and Python

arXiv.org Machine LearningMay-9-2020

The aim of the project is to investigate and assess opportunities for applying reinforcement learning (RL) for power system control. As a proof of concept (PoC), voltage control of thermostatically controlled loads (TCLs) for power consumption regulation was developed using Modelica-based pipeline. The Q-learning RL algorithm has been validated for deterministic and stochastic initialization of TCLs. The latter modelling is closer to real grid behaviour, which challenges the control development, considering the stochastic nature of load switching. In addition, the paper shows the influence of Q-learning parameters, including discretization of state-action space, on the controller performance.

controller, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2005.04444

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Energy > Power Industry (0.94)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wilde, Nils, Kulic, Dana, Smith, Stephen L.

Active Preference Learning using Maximum Regret

arXiv.org Artificial IntelligenceMay-8-2020

We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots. In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences, modeled as a parameterized cost function. Previous approaches present users with alternatives that minimize the uncertainty over the parameters of the cost function. However, different parameters might lead to the same optimal behaviour; as a consequence the solution space is more structured than the parameter space. We exploit this by proposing a query selection that greedily reduces the maximum error ratio over the solution space. In simulations we demonstrate that the proposed approach outperforms other state of the art techniques in both learning efficiency and ease of queries for the user. Finally, we show that evaluating the learning based on the similarities of solutions instead of the similarities of weights allows for better predictions for different scenarios.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2005.04067

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation (0.47)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Kuznetsov, Arsenii, Shvechikov, Pavel, Grishin, Alexander, Vetrov, Dmitry

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

arXiv.org Artificial IntelligenceMay-8-2020

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

controlling overestimation bias, machine learning, reinforcement learning, (11 more...)

2005.04269

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Belzner, Lenz, Wirsing, Martin

Synthesizing Safe Policies under Probabilistic Constraints with Reinforcement Learning and Bayesian Model Checking

arXiv.org Artificial IntelligenceMay-8-2020

In this paper we propose Policy Synthesis under probabilistic Constraints (PSyCo), a systematic engineering method for synthesizing safe policies under probabilistic constraints with reinforcement learning and Bayesian model checking. As an implementation of PSyCo we introduce Safe Neural Evolutionary Strategies (SNES). SNES leverages Bayesian model checking while learning to adjust the Lagrangian of a constrained optimization problem derived from a PSyCo specification. We empirically evaluate SNES' ability to synthesize feasible policies in settings with formal safety requirements.

constraint, machine learning, reinforcement learning, (14 more...)

2005.03898

Country:

South America > Brazil > Rio de Janeiro > Niterói (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Kanervisto, Anssi, Karttunen, Janne, Hautamäki, Ville

Playing Minecraft with Behavioural Cloning

arXiv.org Artificial IntelligenceMay-7-2020

MineRL 2019 competition challenged participants to train sample-efficient agents to play Minecraft, by using a dataset of human gameplay and a limit number of steps the environment. We approached this task with behavioural cloning by predicting what actions human players would take, and reached fifth place in the final ranking. Despite being a simple algorithm, we observed the performance of such an approach can vary significantly, based on when the training is stopped. In this paper, we detail our submission to the competition, run further experiments to study how performance varied over training and study how different engineering decisions affected these results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2005.03374

Country:

Europe > Finland > North Karelia > Joensuu (0.05)
Europe > Sweden > Skåne County > Malmö (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Games > Computer Games (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)