AITopics

2104.05565

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
(45 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Pierson, Brittany Davis, Ventura, Justine, Taylor, Matthew E.

The Atari Data Scraper

Reinforcement learning has made great strides in recent years due to the success of methods using deep neural networks. However, such neural networks act as a black box, obscuring the inner workings. While reinforcement learning has the potential to solve unique problems, a lack of trust and understanding of reinforcement learning algorithms could prevent their widespread adoption. Here, we present a library that attaches a "data scraper" to deep reinforcement learning agents, acting as an observer, and then show how the data collected by the Atari Data Scraper can be used to understand and interpret deep reinforcement learning agents. The code for the Atari Data Scraper can be found here: https: // github.

agent, atari data scraper, reinforcement, (14 more...)

2104.04893

Country:

North America > Canada > Alberta (0.15)
North America > United States > Washington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Chenu, Alexandre, Perrin-Gilbert, Nicolas, Doncieux, Stéphane, Sigaud, Olivier

Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms

Reinforcement learning agents need a reward signal to learn successful policies. When this signal is sparse or the corresponding gradient is deceptive, such agents need a dedicated mechanism to efficiently explore their search space without relying on the reward. Looking for a large diversity of behaviors or using Motion Planning (MP) algorithms are two options in this context. In this paper, we build on the common roots between these two options to investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms. These algorithms look for diversity in an outcome space or behavioral space which is generally hand-designed to represent what matters for a given task. The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency. In particular, we show empirically that, if the mapping is smooth enough, i.e. if two close policies in the parameter space lead to similar outcomes, then diversity algorithms tend to inherit exploration properties of MP algorithms. By contrast, if it is not, diversity algorithms lose these properties and their performance strongly depends on specific heuristics, notably filtering mechanisms that discard some of the explored policies.

algorithm, expansion operator, operator, (16 more...)

2104.04768

Country:

North America > United States > New York (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

Hou, Zhengxu, Liu, Bang, Zhao, Ruihui, Ou, Zijing, Liu, Yafei, Chen, Xi, Zheng, Yefeng

For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action pairs in real dialogs. In this paper, we propose a multi-level reward modeling approach that factorizes a reward into a three-level hierarchy: domain, act, and slot. Based on inverse adversarial reinforcement learning, our designed reward model can provide more accurate and explainable reward signals for state-action pairs.Extensive evaluations show that our approach can be applied to a wide range of reinforcement learning-based dialog systems and significantly improves both the performance and the speed of convergence.

agent, latexit latexit sha1, latexit sha1, (13 more...)

2104.04748

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Tasrin, Tasmia, Nahian, Md Sultan Al, Perera, Habarakadage, Harrison, Brent

Influencing Reinforcement Learning through Natural Language Guidance

Interactive reinforcement learning agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that is either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known Interactive reinforcement learning technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep reinforcement learning agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven reinforcement learning agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the reinforcement learning agent to better policy shaping.

agent, human feedback, proceedings, (13 more...)

2104.01506

Country:

North America > United States > Kentucky (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceApr-9-2021, 08:04:13 GMT

Reinforcement Learning for Dynamic Pricing

Limitations on physical interactions throughout the world have reshaped our lives and habits. And while the pandemic has been disrupting the majority of industries, e-commerce has been thriving. This article covers how reinforcement learning for dynamic pricing helps retailers refine their pricing strategies to increase profitability and boost customer engagement and loyalty. In dynamic pricing, we want an agent to set optimal prices based on market conditions. In terms of RL concepts, actions are all of the possible prices and states, market conditions, except for the current price of the product or service.

action value, agent, optimal action value, (14 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceApr-9-2021, 01:00:58 GMT

[N] Call for papers: KDD 2021 Workshop on Bayesian Causal Inference for Real-World Interactive Systems

Increasingly we use machine learning to build interactive systems that learn from past actions and the reward obtained. Theory suggests several possible approaches, such as contextual bandits, reinforcement learning, the do-calculus, or plain old Bayesian decision theory. What are the most theoretically appropriate and practical approaches to doing causal inference for interactive systems? We are particularly interested in case studies of applying machine learning methods to interactive systems that did or did not use Bayesian or likelihood based methods, with a discussion about why this choice was made in terms of practical or theoretical arguments.

bayesian causal inference, real-world interactive system, workshop, (1 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Fayad, Ammar, Ibrahim, Majd

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

arXiv.org Artificial IntelligenceApr-9-2021

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-policy actor-critic deep RL algorithm. BAC mathematically formulates the behavior of the policy through autoencoders by providing an accurate estimation of how frequently each state-action pair was visited while taking into consideration state dynamics that play a crucial role in determining the trajectories produced by the policy. The agent is encouraged to change its behavior consistently towards less-visited state-action pairs while attaining good performance by maximizing the expected discounted sum of rewards, resulting in an efficient exploration of the environment and good exploitation of all high reward regions. One prominent aspect of our approach is that it is applicable to both stochastic and deterministic actors in contrast to maximum entropy deep reinforcement learning algorithms. Results show considerably better performances of BAC when compared to several cutting-edge learning algorithms.

algorithm, exploration, reinforcement learning, (8 more...)

2104.04424

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Sardinia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceApr-9-2021

Counter-Strike Deathmatch with Large-Scale Behavioural Cloning

Pearce, Tim, Zhu, Jun

This paper describes an AI agent that plays the popular first-person-shooter (FPS) video game'Counter-Strike; Global Offensive' (CSGO) from pixel input. The agent, a deep neural network, matches the performance of the medium difficulty built-in AI on the deathmatch game mode, whilst adopting a humanlike play style. Unlike much prior work in games, no API is available for CSGO, so algorithms must train and run in real-time. This limits the quantity of on-policy data that can be generated, precluding many reinforcement learning algorithms. Our solution uses behavioural cloning -- training on a large noisy dataset scraped from human play on online servers (4 million frames, comparable in size to ImageNet), and a smaller dataset of high-quality expert demonstrations. This scale is an order of magnitude larger than prior work on imitation learning in FPS games.

agent, csgo, dataset, (17 more...)

2104.04258

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Choi, Jinyoung, Dance, Christopher R., Kim, Jung-eun, Hwang, Seulbin, Park, Kyung-sik

Risk-Conditioned Distributional Soft Actor-Critic for Risk-Sensitive Navigation

arXiv.org Artificial IntelligenceApr-9-2021

Modern navigation algorithms based on deep reinforcement learning (RL) show promising efficiency and robustness. However, most deep RL algorithms operate in a risk-neutral manner, making no special attempt to shield users from relatively rare but serious outcomes, even if such shielding might cause little loss of performance. Furthermore, such algorithms typically make no provisions to ensure safety in the presence of inaccuracies in the models on which they were trained, beyond adding a cost-of-collision and some domain randomization while training, in spite of the formidable complexity of the environments in which they operate. In this paper, we present a novel distributional RL algorithm that not only learns an uncertainty-aware policy, but can also change its risk measure without expensive fine-tuning or retraining. Our method shows superior performance and safety over baselines in partially-observed navigation tasks. We also demonstrate that agents trained using our method can adapt their policies to a wide range of risk measures at run-time.

algorithm, rc-dsac, risk measure, (12 more...)

2104.03111

Country:

Europe > France (0.04)
Asia > South Korea (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)