AITopics

2010.01112

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Michigan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningOct-2-2020

Adaptive Procedural Task Generation for Hard-Exploration Problems

Fang, Kuan, Zhu, Yuke, Savarese, Silvio, Fei-Fei, Li

We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks. Through adversarial training, the task similarity is adaptively estimated by a task discriminator defined on the agent's experiences, allowing the generated tasks to approximate target tasks of unknown parameterization or outside of the predefined task space. Our experiments on grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations.

machine learning, reinforcement learning, target task, (11 more...)

2007.0035

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Gong, Xiao-Yue, Simchi-Levi, David

Provably More Efficient Q-Learning in the One-Sided-Feedback/Full-Feedback Settings

arXiv.org Machine LearningOct-2-2020

Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting. We also provide a simpler variant of the algorithm, Full-Q-Learning (FQL), for the full-feedback setting. We establish that HQL incurs $ \tilde{\mathcal{O}}(H^3\sqrt{ T})$ regret and FQL incurs $\tilde{\mathcal{O}}(H^2\sqrt{ T})$ regret, where $H$ is the length of each episode and $T$ is the total length of the horizon. The regret bounds are not affected by the possibly huge state and action space. Our numerical experiments demonstrate the superior efficiency of HQL and FQL, and the potential to combine reinforcement learning with richer feedback models.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2007.0008

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Brero, Gianluca, Eden, Alon, Gerstgrasser, Matthias, Parkes, David C., Rheingans-Yoo, Duncan

Reinforcement Learning of Simple Indirect Mechanisms

arXiv.org Artificial IntelligenceOct-2-2020

Over the last fifty years, a large body of research in microeconomics has introduced many different mechanisms for resource allocation. Despite the wide variety of available options, "simple" mechanisms such as posted price and serial dictatorship are often preferred for practical applications, including housing allocation [Abdulkadiroğlu and Sönmez, 1998], online procurement [Badanidiyuru et al., 2012], or allocation of medical appointments [Klaus and Nichifor, 2019]. There has been considerable interest in formalizing different notions of simplicity. Li [2017] identifies mechanisms that are particularly simple from a strategic perspective, introducing the concept of obviously strategyproof mechanisms; under obviously strategyproof mechanisms, it is obvious that an agent cannot profit by trying to game the system, as even the worst possible final outcome from behaving truthfully is at least as good as the best possible outcome from any other strategy. Pycia and Troyan [2019] introduce the still stronger concept of strongly obviously strategyproof (SOSP) mechanisms, and show that this class can essentially be identified with sequential price mechanisms, where agents are visited in turn and offered a choice from a menu of options (which may or may not include transfers). SOSP mechanisms are ones in which an agent is not even required to consider her future (truthful) actions to understand that the mechanism is obviously strategyproof.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

2010.0118

Country:

North America > United States (0.14)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
(2 more...)

Zhang, Shangtong, Laroche, Romain, van Seijen, Harm, Whiteson, Shimon, Combes, Remi Tachet des

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

arXiv.org Artificial IntelligenceOct-2-2020

We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $\gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($\gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

2010.01069

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningOct-1-2020

Minimax Optimal Reinforcement Learning for Discounted MDPs

He, Jiafan, Zhou, Dongruo, Gu, Quanquan

The goal of reinforcement learning is designing algorithms to learn the optimal policy through interactions with the unknown dynamic environment. Markov decision processes (MDPs) plays a central role in reinforcement learning due to their ability to describe the time-independent state transition property. In specific, the discounted MDP is one of the standard MDPs in reinforcement learning to describe sequential tasks without interruption or restart. Various reinforcement learning algorithms have been proposed for discounted MDPs. In specific, Azar et al. (2013) proposed an Empirical QVI algorithm which achieves the optimal sample complexity to find the optimal value function. Sidford et al. (2018a) proposed a sublinear randomized value iteration algorithm that achieves a near-optimal sample complexity to find the optimal policy, and Sidford et al. (2018b) further improved it to reach the optimal sample complexity.

inequality hold, machine learning, reinforcement learning, (12 more...)

2010.00587

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningOct-1-2020

Understanding the Role of Adversarial Regularization in Supervised Learning

Rout, Litu

Despite numerous attempts sought to provide empirical evidence of adversarial regularization outperforming sole supervision, the theoretical understanding of such phenomena remains elusive. In this study, we aim to resolve whether adversarial regularization indeed performs better than sole supervision at a fundamental level. To bring this insight into fruition, we study vanishing gradient issue, asymptotic iteration complexity, gradient flow and provable convergence in the context of sole supervision and adversarial regularization. The key ingredient is a theoretical justification supported by empirical evidence of adversarial acceleration in gradient descent. In addition, motivated by a recently introduced unit-wise capacity based generalization bound, we analyze the generalization error in adversarial framework. Guided by our observation, we cast doubts on the ability of this measure to explain generalization. We therefore leave as open questions to explore new measures that can explain generalization behavior in adversarial learning. Furthermore, we observe an intriguing phenomenon in the neural embedded vector space while contrasting adversarial learning with sole supervision.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2010.00522

Country: Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Ilhan, Ercument, Perez-Liebana, Diego

Student-Initiated Action Advising via Advice Novelty

arXiv.org Machine LearningOct-1-2020

Action advising is a knowledge exchange mechanism between peers, namely student and teacher, that can help tackle exploration and sample inefficiency problems in deep reinforcement learning. Due to the practical limitations in peer-to-peer communication and the negative implications of over-advising, the peer responsible for initiating these interactions needs to do so only when it's most adequate to exchange advice. Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, these estimations have several weaknesses, such as having no information regarding the characteristics of convergence and being subject to delays that occur in the presence of experience replay dynamics. We propose a student-initiated action advising algorithm that alleviates these shortcomings. Specifically, we employ Random Network Distillation (RND) to measure the novelty of an advice, for the student to determine whether to proceed with the request; furthermore, we perform RND updates only for the advised states to ensure that the student's convergence will not prevent it from utilising the teacher's knowledge at any stage of learning. Experiments in GridWorld and simplified versions of five Atari games show that our approach can perform on par with the state-of-the-art and demonstrate significant advantages in the scenarios where the existing methods are prone to fail.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2010.00381

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
South America > Brazil > São Paulo (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ammanabrolu, Prithviraj, Urbanek, Jack, Li, Margaret, Szlam, Arthur, Rocktäschel, Tim, Weston, Jason

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds

arXiv.org Artificial IntelligenceOct-1-2020

We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)---a large-scale crowd-sourced fantasy text-game---with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations.

machine learning, natural language, reinforcement learning, (18 more...)

2010.00685

Country:

Europe > Germany > Saarland > Saarbrücken (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ndousse, Kamal, Eck, Douglas, Levine, Sergey, Jaques, Natasha

Multi-agent Social Reinforcement Learning Improves Generalization

arXiv.org Artificial IntelligenceOct-1-2020

Social learning is a key component of human and animal intelligence. By taking cues from the behavior of experts in their environment, social learners can acquire sophisticated behavior and rapidly adapt to new circumstances. This paper investigates whether independent reinforcement learning (RL) agents in a multi-agent environment can use social learning to improve their performance using cues from other agents. We find that in most circumstances, vanilla model-free RL agents do not use social learning, even in environments in which individual exploration is expensive. We analyze the reasons for this deficiency, and show that by introducing a model-based auxiliary loss we are able to train agents to lever-age cues from experts to solve hard exploration tasks. The generalized social learning policy learned by these agents allows them to not only outperform the experts with which they trained, but also achieve better zero-shot transfer performance than solo learners when deployed to novel environments with experts. In contrast, agents that have not learned to rely on social learning generalize poorly and do not succeed in the transfer task. Further,we find that by mixing multi-agent and solo training, we can obtain agents that use social learning to out-perform agents trained alone, even when experts are not avail-able. This demonstrates that social learning has helped improve agents' representation of the task itself. Our results indicate that social learning can enable RL agents to not only improve performance on the task at hand, but improve generalization to novel environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2010.00581

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)