AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

"It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games

Jacob, Mikhail, Devlin, Sam, Hofmann, Katja

arXiv.org Artificial IntelligenceSep-1-2020

Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games. As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it is vital that the research community learn from the best practices cultivated within the industry over decades creating agents. However, although commercial game agent creation pipelines are more mature than those based on ML, opportunities for improvement still abound. As a foundation for shared progress identifying research opportunities between researchers and practitioners, we interviewed seventeen game agent creators from AAA studios, indie studios, and industrial research labs about the challenges they experienced with their professional workflows. Our study revealed several open challenges ranging from design to implementation and evaluation. We compare with literature from the research community that address the challenges identified and conclude by highlighting promising directions for future research supporting agent creation in the games industry.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2009.00541

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics

Sun, Yanchao, Huang, Furong

arXiv.org Machine LearningSep-1-2020

Poisoning attacks, although have been studied extensively in supervised learning, are not well understood in Reinforcement Learning (RL), especially in deep RL. Prior works on poisoning RL usually either assume the attacker knows the underlying Markov Decision Process (MDP), or directly apply the poisoning methods in supervised learning to RL. In this work, we build a generic poisoning framework for online RL via a comprehensive investigation of heterogeneous types/victims of poisoning attacks in RL, considering the unique challenges in RL such as data no longer being i.i.d. Without any prior knowledge of the MDP, we propose a strategic poisoning algorithm called Vulnerability-Aware Adversarial Critic Poison (VA2C-P), which works for most policy-based deep RL agents, using a novel metric, stability radius in RL, that measures the vulnerability of RL algorithms. Experiments on multiple deep RL agents and multiple environments show that our poisoning algorithm successfully prevents agents from learning a good policy, with a limited attacking budget. Our experiment results demonstrate varying vulnerabilities of different deep RL agents in multiple environments, benefiting the understanding and applications of deep RL under security threat scenarios.

learner, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2009.00774

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Allen Institute open-sources AllenAct, a framework for research in embodied AI

#artificialintelligenceAug-31-2020, 23:35:12 GMT

Researchers at the Allen Institute for AI today launched AllenAct, a platform intended to promote reproducible research in embodied AI with a focus on modularity and flexibility. AllenAct, which is available in beta, supports multiple training environments and algorithms with tutorials, pretrained models, and out-of-the-box real-time visualizations. Embodied AI, the AI subdomain concerning systems that learn to complete tasks through environmental interactions, has experienced substantial growth. The Allen Institute argues that this growth has been mostly beneficial, but it takes issue with the fragmented nature of embodied AI development tools, which it says discourages good science. In a recent analysis, the Allen Institute found that the number of embodied AI papers now exceeds 160 (up from around 20 in 2018 and 60 in 2019) and that the number of environments, tasks, modalities, and algorithms varies widely among them.

allenact, machine learning, reinforcement learning, (9 more...)

#artificialintelligence

Genre: Instructional Material (0.37)

Industry: Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Top 10 Reinforcement Learning Courses & Certifications in 2020

#artificialintelligenceAug-31-2020, 09:50:10 GMT

Reinforcement Learning is one of the most in demand research topics whose popularity is only growing day by day. An RL expert learns from experience, rather than being explicitly taught, which is essentially trial and error learning. To understand RL, Analytics Insight compiles the Top 10 Reinforcement Learning Courses and Certifications in 2020. The reinforcement learning specialization consists of four courses that explore the power of adaptive learning systems and artificial intelligence (AI). On this MOOC course, you will learn how Reinforcement Learning (RL) solutions help to solve real-world problems through trial-and-error interaction by implementing a complete RL solution.

computer based training, deep learning, educational technology, (19 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (1.00)
Energy > Oil & Gas (0.50)
Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Google-DeepMind's Dreamer is a Reinforcement Learning Agent that can Solve Long-Horizon Tasks

#artificialintelligenceAug-31-2020, 06:20:16 GMT

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Deep reinforcement leaning(DRL) has been at the center of some of the most important artificial intelligence(AI) breakthroughs of the last decade. Given its dependency on interactions with an environment, DRL is regularly applied to many real world scenarios such as self-driving vehicles that operate in really complex environments.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Control of a Nature-inspired Scorpion using Reinforcement Learning

Agrawal, Aakriti, Rajashekhar, V S, Arasanipalai, Rohitkumar, Ghose, Debasish

arXiv.org Artificial IntelligenceAug-31-2020

A terrestrial robot that can maneuver rough terrain and scout places is very useful in mapping out unknown areas. It can also be used explore dangerous areas in place of humans. A terrestrial robot modeled after a scorpion will be able to traverse undetected and can be used for surveillance purposes. Therefore, this paper proposes modelling of a scorpion inspired robot and a reinforcement learning (RL) based controller for navigation. The robot scorpion uses serial four bar mechanisms for the legs movements. It also has an active tail and a movable claw. The controller is trained to navigate the robot scorpion to the target waypoint. The simulation results demonstrate efficient navigation of the robot scorpion.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2008.13712

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Pang, Bo, Jiang, Zhong-Ping

arXiv.org Artificial IntelligenceAug-31-2020

This paper studies the robustness aspect of reinforcement learning algorithms in the presence of errors. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable for dynamical systems with unbounded, continuous state and action spaces? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors and enjoys local input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighborhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2008.11592

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > New York > Kings County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

Zhang, Qifan, Guan, Yue, Tsiotras, Panagiotis

arXiv.org Machine LearningAug-31-2020

We explore the use of policy approximation for reducing the computational cost of learning Nash equilibria in multi-agent reinforcement learning scenarios. We propose a new algorithm for zero-sum stochastic games in which each agent simultaneously learns a Nash policy and an entropy-regularized policy. The two policies help each other towards convergence: the former guides the latter to the desired Nash equilibrium, while the latter serves as an efficient approximation of the former. We demonstrate the possibility of using the proposed algorithm to transfer previous training experiences to different environments, enabling the agents to adapt quickly to new tasks. We also provide a dynamic hyper-parameter scheduling scheme for further expedited convergence. Empirical results applied to a number of stochastic games show that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2009.00162

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

Chung, Wesley, Thomas, Valentin, Machado, Marlos C., Roux, Nicolas Le

arXiv.org Machine LearningAug-31-2020

Policy gradients methods are a popular and effective choice to train reinforcement learning agents in complex environments. The variance of the stochastic policy gradient is often seen as a key quantity to determine the effectiveness of the algorithm. Baselines are a common addition to reduce the variance of the gradient, but previous works have hardly ever considered other effects baselines may have on the optimization process. Using simple examples, we find that baselines modify the optimization dynamics even when the variance is the same. In certain cases, a baseline with lower variance may even be worse than another with higher variance. Furthermore, we find that the choice of baseline can affect the convergence of natural policy gradient, where certain baselines may lead to convergence to a suboptimal policy for any stepsize. Such behaviour emerges when sampling is constrained to be done using the current policy and we show how decoupling the sampling policy from the current policy guarantees convergence for a much wider range of baselines. More broadly, this work suggests that a more careful treatment of stochasticity in the updates---beyond the immediate variance---is necessary to understand the optimization process of policy gradient algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2008.13773

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Ranking Policy Decisions

Pouget, Hadrien, Chockler, Hana, Sun, Youcheng, Kroening, Daniel

arXiv.org Machine LearningAug-31-2020

Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them more difficult to analyse and interpret. In a run with $n$ time steps, a policy will decide $n$ times on an action to take, even when only a tiny subset of these decisions deliver value over selecting a simple default action. Given a pre-trained policy, we propose a black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We evaluate our ranking method by creating new, simpler policies by pruning decisions identified as unimportant, and measure the impact on performance. Our experimental results on a diverse set of standard benchmarks (gridworld, CartPole, Atari games) show that in some cases less than half of the decisions made contribute to the expected reward. We furthermore show that the decisions made in the most frequently visited states are not the most important for the expected reward.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2008.13607

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.57)
Transportation > Air (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback