AITopics

doi: 10.1109/TWC.2021.3069240

2103.15371

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-29-2021

Robust Reinforcement Learning under model misspecification

Yu, Lebin, Wang, Jian, Zhang, Xudong

Reinforcement learning has achieved remarkable performance in a wide range of tasks these days. Nevertheless, some unsolved problems limit its applications in real-world control. One of them is model misspecification, a situation where an agent is trained and deployed in environments with different transition dynamics. We propose an novel framework that utilize history trajectory and Partial Observable Markov Decision Process Modeling to deal with this dilemma. Additionally, we put forward an efficient adversarial attack method to assist robust training. Our experiments in four gym domains validate the effectiveness of our framework.

agent, arxiv preprint arxiv, model misspecification, (12 more...)

2103.1537

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.51)
Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Artificial IntelligenceMar-29-2021

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Mohanty, Sharada, Poonganam, Jyotish, Gaidon, Adrien, Kolobov, Andrey, Wulfe, Blake, Chakraborty, Dipam, Šemetulskis, Gražvydas, Schapke, João, Kubilius, Jonas, Pašukonis, Jurgis, Klimas, Linas, Hausknecht, Matthew, MacAlpine, Patrick, Tran, Quang Nhat, Tumiel, Thomas, Tang, Xiaocheng, Chen, Xinwei, Hesse, Christopher, Hilton, Jacob, Guss, William Hebgen, Genc, Sahika, Schulman, John, Cobbe, Karl

The NeurIPS 2020 Procgen Competition was designed as a centralized benchmark with clearly defined tasks for measuring Sample Efficiency and Generalization in Reinforcement Learning. Generalization remains one of the most fundamental challenges in deep reinforcement learning, and yet we do not have enough benchmarks to measure the progress of the community on Generalization in Reinforcement Learning. We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way. We designed the benchmark on top of the already existing Procgen Benchmark by defining clear tasks and standardizing the end to end evaluation setups. The design aims to maximize the flexibility available for researchers who wish to design future iterations of such benchmarks, and yet imposes necessary practical constraints to allow for a system like this to scale. This paper presents the competition setup and the details and analysis of the top solutions identified through this setup in context of 2020 iteration of the competition at NeurIPS.

mean normalized reward, submission, training phase, (15 more...)

2103.15332

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Ramakrishnan, Ramya, Unhelkar, Vaibhav, Kamar, Ece, Shah, Julie

A Bayesian Approach to Identifying Representational Errors

arXiv.org Artificial IntelligenceMar-28-2021

Trained AI systems and expert decision makers can make errors that are often difficult to identify and understand. Determining the root cause for these errors can improve future decisions. This work presents Generative Error Model (GEM), a generative model for inferring representational errors based on observations of an actor's behavior (either simulated agent, robot, or human). The model considers two sources of error: those that occur due to representational limitations -- "blind spots" -- and non-representational errors, such as those caused by noise in execution or systematic errors present in the actor's policy. Disambiguating these two error types allows for targeted refinement of the actor's policy (i.e., representational errors require perceptual augmentation, while other errors can be reduced through methods such as improved training or attention support). We present a Bayesian inference algorithm for GEM and evaluate its utility in recovering representational errors on multiple domains. Results show that our approach can recover blind spots of both reinforcement learning agents as well as human users.

actor, blind spot, representation, (13 more...)

2103.15171

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMar-27-2021

KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent Reinforcement Learning

Gao, Zijian, Xu, Kele, Ding, Bo, Wang, Huaimin, Li, Yiying, Jia, Hongda

Recently, deep Reinforcement Learning (RL) algorithms have achieved dramatically progress in the multi-agent area. However, training the increasingly complex tasks would be time-consuming and resources-exhausting. To alleviate this problem, efficient leveraging the historical experience is essential, which is under-explored in previous studies as most of the exiting methods may fail to achieve this goal in a continuously variational system due to their complicated design and environmental dynamics. In this paper, we propose a method, named "KnowRU" for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design. We employ the knowledge distillation paradigm to transfer the knowledge among agents with the goal to accelerate the training phase for new tasks, while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art multi-agent reinforcement learning (MARL) algorithms on collaborative and competitive scenarios. The results show that KnowRU can outperform the recently reported methods, which emphasizes the importance of the proposed knowledge reusing for MARL.

agent, knowledge, scenario, (16 more...)

2103.14891

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-27-2021

Co-Imitation Learning without Expert Demonstration

Ning, Kun-Peng, Xu, Hu, Zhu, Kun, Huang, Sheng-Jun

Imitation learning is a primary approach to improve the efficiency of reinforcement learning by exploiting the expert demonstrations. However, in many real scenarios, obtaining expert demonstrations could be extremely expensive or even impossible. To overcome this challenge, in this paper, we propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents themselves without expert demonstration. Specifically, we train two different agents via letting each of them alternately explore the environment and exploit the peer agent's experience. While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function. Thus the agents can selectively imitate from each other by emphasizing the more useful experiences while filtering out noisy ones. Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework, validating that the agents can benefit from each other without external supervision.

agent, demonstration, exploration, (12 more...)

2103.14823

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Kenton, Zachary, Everitt, Tom, Weidinger, Laura, Gabriel, Iason, Mikulik, Vladimir, Irving, Geoffrey

Alignment of Language Agents

For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues.

agent, language agent, manipulation, (13 more...)

2103.14659

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (0.82)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(2 more...)

Blum, Tamir, Paillet, Gabin, Masawat, Watcharawut, Laine, Mickael, Yoshida, Kazuya

SegVisRL: Visuomotor Development for a Lunar Rover for Hazard Avoidance using Camera Images

The visuomotor system of any animal is critical for its survival, and the development of a complex one within humans is large factor in our success as a species on Earth. This system is an essential part of our ability to adapt to our environment. We use this system continuously throughout the day, when picking something up, or walking around while avoiding bumping into objects. Equipping robots with such capabilities will help produce more intelligent locomotion with the ability to more easily understand their surroundings and to move safely. In particular, such capabilities are desirable for traversing the lunar surface, as it is full of hazardous obstacles, such as rocks. These obstacles need to be identified and avoided in real time. This paper seeks to demonstrate the development of a visuomotor system within a robot for navigation and obstacle avoidance, with complex rock shaped objects representing hazards. Our approach uses deep reinforcement learning with only image data. In this paper, we compare the results from several neural network architectures and a preprocessing methodology which includes producing a segmented image and downsampling.

controller, obstacle, rover, (15 more...)

2103.14422

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

MedSelect: Selective Labeling for Medical Image Classification Combining Meta-Learning with Deep Reinforcement Learning

Smit, Akshay, Vrabac, Damir, He, Yujie, Ng, Andrew Y., Beam, Andrew L., Rajpurkar, Pranav

We propose a selective learning method using meta-learning and deep reinforcement learning for medical image interpretation in the setting of limited labeling resources. Our method, MedSelect, consists of a trainable deep learning selector that uses image embeddings obtained from contrastive pretraining for determining which images to label, and a non-parametric selector that uses cosine similarity to classify unseen images. We demonstrate that MedSelect learns an effective selection strategy outperforming baseline selection strategies across seen and unseen medical conditions for chest X-ray interpretation. We also perform an analysis of the selections performed by MedSelect comparing the distribution of latent embeddings and clinical features, and find significant differences compared to the strongest performing baseline. We believe that our method may be broadly applicable across medical imaging settings where labels are expensive to acquire.

baseline, medselect, x-ray, (14 more...)

2103.14339

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.96)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

Li, Zhongyu, Cheng, Xuxin, Peng, Xue Bin, Abbeel, Pieter, Levine, Sergey, Berseth, Glen, Sreenath, Koushil

Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.

cassie, controller, robot, (13 more...)

2103.14295

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)