Goto

Collaborating Authors

 Reinforcement Learning


"It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games

arXiv.org Artificial Intelligence

Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games. As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it is vital that the research community learn from the best practices cultivated within the industry over decades creating agents. However, although commercial game agent creation pipelines are more mature than those based on ML, opportunities for improvement still abound. As a foundation for shared progress identifying research opportunities between researchers and practitioners, we interviewed seventeen game agent creators from AAA studios, indie studios, and industrial research labs about the challenges they experienced with their professional workflows. Our study revealed several open challenges ranging from design to implementation and evaluation. We compare with literature from the research community that address the challenges identified and conclude by highlighting promising directions for future research supporting agent creation in the games industry.


Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics

arXiv.org Machine Learning

Poisoning attacks, although have been studied extensively in supervised learning, are not well understood in Reinforcement Learning (RL), especially in deep RL. Prior works on poisoning RL usually either assume the attacker knows the underlying Markov Decision Process (MDP), or directly apply the poisoning methods in supervised learning to RL. In this work, we build a generic poisoning framework for online RL via a comprehensive investigation of heterogeneous types/victims of poisoning attacks in RL, considering the unique challenges in RL such as data no longer being i.i.d. Without any prior knowledge of the MDP, we propose a strategic poisoning algorithm called Vulnerability-Aware Adversarial Critic Poison (VA2C-P), which works for most policy-based deep RL agents, using a novel metric, stability radius in RL, that measures the vulnerability of RL algorithms. Experiments on multiple deep RL agents and multiple environments show that our poisoning algorithm successfully prevents agents from learning a good policy, with a limited attacking budget. Our experiment results demonstrate varying vulnerabilities of different deep RL agents in multiple environments, benefiting the understanding and applications of deep RL under security threat scenarios.


Allen Institute open-sources AllenAct, a framework for research in embodied AI

#artificialintelligence

Researchers at the Allen Institute for AI today launched AllenAct, a platform intended to promote reproducible research in embodied AI with a focus on modularity and flexibility. AllenAct, which is available in beta, supports multiple training environments and algorithms with tutorials, pretrained models, and out-of-the-box real-time visualizations. Embodied AI, the AI subdomain concerning systems that learn to complete tasks through environmental interactions, has experienced substantial growth. The Allen Institute argues that this growth has been mostly beneficial, but it takes issue with the fragmented nature of embodied AI development tools, which it says discourages good science. In a recent analysis, the Allen Institute found that the number of embodied AI papers now exceeds 160 (up from around 20 in 2018 and 60 in 2019) and that the number of environments, tasks, modalities, and algorithms varies widely among them.


Top 10 Reinforcement Learning Courses & Certifications in 2020

#artificialintelligence

Reinforcement Learning is one of the most in demand research topics whose popularity is only growing day by day. An RL expert learns from experience, rather than being explicitly taught, which is essentially trial and error learning. To understand RL, Analytics Insight compiles the Top 10 Reinforcement Learning Courses and Certifications in 2020. The reinforcement learning specialization consists of four courses that explore the power of adaptive learning systems and artificial intelligence (AI). On this MOOC course, you will learn how Reinforcement Learning (RL) solutions help to solve real-world problems through trial-and-error interaction by implementing a complete RL solution.


Google-DeepMind's Dreamer is a Reinforcement Learning Agent that can Solve Long-Horizon Tasks

#artificialintelligence

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Deep reinforcement leaning(DRL) has been at the center of some of the most important artificial intelligence(AI) breakthroughs of the last decade. Given its dependency on interactions with an environment, DRL is regularly applied to many real world scenarios such as self-driving vehicles that operate in really complex environments.


Control of a Nature-inspired Scorpion using Reinforcement Learning

arXiv.org Artificial Intelligence

A terrestrial robot that can maneuver rough terrain and scout places is very useful in mapping out unknown areas. It can also be used explore dangerous areas in place of humans. A terrestrial robot modeled after a scorpion will be able to traverse undetected and can be used for surveillance purposes. Therefore, this paper proposes modelling of a scorpion inspired robot and a reinforcement learning (RL) based controller for navigation. The robot scorpion uses serial four bar mechanisms for the legs movements. It also has an active tail and a movable claw. The controller is trained to navigate the robot scorpion to the target waypoint. The simulation results demonstrate efficient navigation of the robot scorpion.


Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

arXiv.org Artificial Intelligence

This paper studies the robustness aspect of reinforcement learning algorithms in the presence of errors. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable for dynamical systems with unbounded, continuous state and action spaces? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors and enjoys local input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighborhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.


Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation

arXiv.org Machine Learning

We explore the use of policy approximation for reducing the computational cost of learning Nash equilibria in multi-agent reinforcement learning scenarios. We propose a new algorithm for zero-sum stochastic games in which each agent simultaneously learns a Nash policy and an entropy-regularized policy. The two policies help each other towards convergence: the former guides the latter to the desired Nash equilibrium, while the latter serves as an efficient approximation of the former. We demonstrate the possibility of using the proposed algorithm to transfer previous training experiences to different environments, enabling the agents to adapt quickly to new tasks. We also provide a dynamic hyper-parameter scheduling scheme for further expedited convergence. Empirical results applied to a number of stochastic games show that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms.


Beyond variance reduction: Understanding the true impact of baselines on policy optimization

arXiv.org Machine Learning

Policy gradients methods are a popular and effective choice to train reinforcement learning agents in complex environments. The variance of the stochastic policy gradient is often seen as a key quantity to determine the effectiveness of the algorithm. Baselines are a common addition to reduce the variance of the gradient, but previous works have hardly ever considered other effects baselines may have on the optimization process. Using simple examples, we find that baselines modify the optimization dynamics even when the variance is the same. In certain cases, a baseline with lower variance may even be worse than another with higher variance. Furthermore, we find that the choice of baseline can affect the convergence of natural policy gradient, where certain baselines may lead to convergence to a suboptimal policy for any stepsize. Such behaviour emerges when sampling is constrained to be done using the current policy and we show how decoupling the sampling policy from the current policy guarantees convergence for a much wider range of baselines. More broadly, this work suggests that a more careful treatment of stochasticity in the updates---beyond the immediate variance---is necessary to understand the optimization process of policy gradient algorithms.


Ranking Policy Decisions

arXiv.org Machine Learning

Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them more difficult to analyse and interpret. In a run with $n$ time steps, a policy will decide $n$ times on an action to take, even when only a tiny subset of these decisions deliver value over selecting a simple default action. Given a pre-trained policy, we propose a black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We evaluate our ranking method by creating new, simpler policies by pruning decisions identified as unimportant, and measure the impact on performance. Our experimental results on a diverse set of standard benchmarks (gridworld, CartPole, Atari games) show that in some cases less than half of the decisions made contribute to the expected reward. We furthermore show that the decisions made in the most frequently visited states are not the most important for the expected reward.