AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

"Other-Play" for Zero-Shot Coordination

Hu, Hengyuan, Lerer, Adam, Peysakhovich, Alex, Foerster, Jakob

arXiv.org Artificial IntelligenceMar-9-2020

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

agent, coordination, symmetry, (13 more...)

arXiv.org Artificial Intelligence

2003.02979

Country: North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Reinforcement-learning AIs are vulnerable to a new kind of attack

#artificialintelligenceMar-8-2020, 08:08:10 GMT

The soccer bot lines up to take a shot at the goal. But instead of getting ready to block it, the goalkeeper drops to ground and wiggles its legs. Confused, the striker does a weird little sideways dance, stamping its feet and waving one arm, and then falls over. It's not a tactic you'll see used by the pros, but it shows that an artificial intelligence trained via deep reinforcement learning--the technique behind cutting-edge game-playing AIs like AlphaZero and the OpenAI Five--is more vulnerable to attack than previously thought. And that could have serious consequences.

adversarial policy, adversary, reinforcement, (12 more...)

#artificialintelligence

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)

Industry: Leisure & Entertainment > Sports > Soccer (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

Lin, Jieyu, Dzeparoska, Kristina, Zhang, Sai Qian, Leon-Garcia, Alberto, Papernot, Nicolas

arXiv.org Machine LearningMar-8-2020

In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team to maximize a total team reward. We analyze the robustness of c-MARL to adversaries capable of attacking one of the agents on a team. Through the ability to manipulate this agent's observations, the adversary seeks to decrease the total team reward. Attacking c-MARL is challenging for three reasons: first, it is difficult to estimate team rewards or how they are impacted by an agent mispredicting; second, models are non-differentiable; and third, the feature space is low-dimensional. Thus, we introduce a novel attack. The attacker first trains a policy network with reinforcement learning to find a wrong action it should encourage the victim agent to take. Then, the adversary uses targeted adversarial examples to force the victim to take this action. Our results on the StartCraft II multi-agent benchmark demonstrate that c-MARL teams are highly vulnerable to perturbations applied to one of their agent's observations. By attacking a single agent, our attack method has highly negative impact on the overall team reward, reducing it from 20 to 9.4. This results in the team's winning rate to go down from 98.9% to 0%.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2003.03722

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.94)
Leisure & Entertainment (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.61)

Add feedback

Deep Adversarial Reinforcement Learning for Object Disentangling

Laux, Melvin, Arenz, Oleg, Peters, Jan, Pajarinen, Joni

arXiv.org Artificial IntelligenceMar-8-2020

Deep learning in combination with improved training techniques and high computational power has led to recent advances in the field of reinforcement learning (RL) and to successful robotic RL applications such as in-hand manipulation. However, most robotic RL relies on a well known initial state distribution. In real-world tasks, this information is however often not available. For example, when disentangling waste objects the actual position of the robot w.r.t.\ the objects may not match the positions the RL policy was trained for. To solve this problem, we present a novel adversarial reinforcement learning (ARL) framework. The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states. We train the protagonist and the adversary jointly to allow them to adapt to the changing policy of their opponent. We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task. Experiments with a KUKA LBR+ 7-DOF robot arm show that our approach outperforms the baseline method in disentangling when starting from different initial states than provided during training.

adversary, agent, protagonist, (14 more...)

arXiv.org Artificial Intelligence

2003.03779

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Reinforcement Learning-- An Introduction to Gradient Temporal Difference Learning Algorithms

#artificialintelligenceMar-7-2020, 17:16:59 GMT

Reinforcement learning is one of the hottest fields to be in right now, with concrete applications growing at an incredibly rapid pace, from beating video games to robotics. At its essence, reinforcement learning (RL) deals with decision making --i.e. it attempts to answer the question of how an agent should act in a given environment. Loosely speaking, all of RL comes down to either finding or evaluating a policy, which is just a way of behaving. For example, a policy could be a playing strategy in chess. A policy takes a state -- in the chess example, the position of all the pieces on the board -- and assigns an action to it. For example, given the state of your chess board, your policy might ask you to move your queen forward.

convergence, gradient temporal difference learning algorithm, reinforcement learning

#artificialintelligence

Industry: Leisure & Entertainment > Games > Chess (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Q Learning Intro/Table - Reinforcement Learning p.1

#artificialintelligenceMar-7-2020, 10:34:53 GMT

Welcome to a reinforcement learning tutorial. In this part, we're going to focus on Q-Learning. Q-Learning is a model-free form of machine learning, in the sense that the AI "agent" does not need to know or have a model of the environment that it will be in. The same algorithm can be used across a variety of environments. For a given environment, everything is broken down into "states" and "actions."

learning intro table, q-learning, tutorial, (1 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate

Zhang, Yufeng, Cai, Qi, Yang, Zhuoran, Wang, Zhaoran

arXiv.org Machine LearningMar-7-2020

Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks. Different from reinforcement learning, GAIL learns both policy and reward function from expert (human) demonstration. Despite its empirical success, it remains unclear whether GAIL with neural networks converges to the globally optimal solution. The major difficulty comes from the nonconvex-nonconcave minimax optimization structure. To bridge the gap between practice and theory, we analyze a gradient-based algorithm with alternating updates and establish its sublinear convergence to the globally optimal solution. To the best of our knowledge, our analysis establishes the global optimality and convergence rate of GAIL with neural networks for the first time.

assumption 4, inequality follow, neural network, (10 more...)

arXiv.org Machine Learning

2003.03709

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reinforcement Learning for Combinatorial Optimization: A Survey

Mazyavkina, Nina, Sviridov, Sergey, Ivanov, Sergei, Burnaev, Evgeny

arXiv.org Machine LearningMar-7-2020

Combinatorial optimization (CO) is the workhorse of numerous important applications in operations research, engineering and other fields and, thus, has been attracting enormous attention from the research community for over a century. Many efficient solutions to common problems involve using hand-crafted heuristics to sequentially construct a solution. Therefore, it is intriguing to see how a combinatorial optimization problem can be formulated as a sequential decision making process and whether efficient heuristics can be implicitly learned by a reinforcement learning agent to find a solution. This survey explores the synergy between CO and reinforcement learning (RL) framework, which can become a promising direction for solving combinatorial problems.

algorithm, learning, reinforcement, (10 more...)

arXiv.org Machine Learning

2003.036

Country:

North America > United States (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > France (0.04)

Genre: Overview (0.88)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Convergence of Q-value in case of Gaussian rewards

Miyamoto, Konatsu, Suzuki, Masaya, Kigami, Yuma, Satake, Kodai

arXiv.org Machine LearningMar-7-2020

In this paper, as a study of reinforcement learning, we converge the Q function to unbounded rewards such as Gaussian distribution. From the central limit theorem, in some real-world applications it is natural to assume that rewards follow a Gaussian distribution , but existing proofs cannot guarantee convergence of the Q-function. Furthermore, in the distribution-type reinforcement learning and Bayesian reinforcement learning that have become popular in recent years, it is better to allow the reward to have a Gaussian distribution. Therefore, in this paper, we prove the convergence of the Q-function under the condition of $E[r(s,a)^2]<\infty$, which is much more relaxed than the existing research. Finally, as a bonus, a proof of the policy gradient theorem for distributed reinforcement learning is also posted.

convergence, probability 1, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2003.03526

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Adversarial Machine Learning: Perspectives from Adversarial Risk Analysis

Insua, David Rios, Naveiro, Roi, Gallego, Victor, Poulos, Jason

arXiv.org Artificial IntelligenceMar-7-2020

Adversarial Machine Learning (AML) is emerging as a major field aimed at the protection of automated ML systems against security threats. The majority of work in this area has built upon a game-theoretic framework by modelling a conflict between an attacker and a defender. After reviewing game-theoretic approaches to AML, we discuss the benefits that a Bayesian Adversarial Risk Analysis perspective brings when defending ML based systems. A research agenda is included.

adversary, attacker, defender, (16 more...)

arXiv.org Artificial Intelligence

2003.03546

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback