AITopics

doi: 10.1109/TMECH.2021.3072675

2104.07282

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Oceania > Australia > New South Wales (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Energy (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AIHubApr-14-2021, 13:09:00 GMT

Maximum entropy RL (provably) solves some robust RL problems

Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment. Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments.

algorithm, maxent rl, rl algorithm, (13 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.41)

Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Balakuntala, Mythra V., Kaur, Upinder, Ma, Xin, Wachs, Juan, Voyles, Richard M.

Everyday contact-rich tasks, such as peeling, cleaning, and writing, demand multimodal perception for effective and precise task execution. However, these present a novel challenge to robots as they lack the ability to combine these multimodal stimuli for performing contact-rich tasks. Learning-based methods have attempted to model multi-modal contact-rich tasks, but they often require extensive training examples and task-specific reward functions which limits their practicality and scope. Hence, we propose a generalizable model-free learning-from-demonstration framework for robots to learn contact-rich skills without explicit reward engineering. We present a novel multi-modal sensor data representation which improves the learning performance for contact-rich skills. We performed training and experiments using the real-life Sawyer robot for three everyday contact-rich skills -- cleaning, writing, and peeling. Notably, the framework achieves a success rate of 100\% for the peeling and writing skill, and 80\% for the cleaning skill. Hence, this skill learning framework can be extended for learning other physical manipulation skills.

demonstration, machine learning, reinforcement learning, (18 more...)

2103.01296

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

GridToPix: Training Embodied Agents with Minimal Supervision

Jain, Unnat, Liu, Iou-Jen, Lazebnik, Svetlana, Kembhavi, Aniruddha, Weihs, Luca, Schwing, Alexander

While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal rewards, present-day Embodied AI results degrade significantly across Embodied AI problems from single-agent Habitat-based PointGoal Navigation (SPL drops from 55 to 0) and two-agent AI2-THOR-based Furniture Moving (success drops from 58% to 1%) to three-agent Google Football-based 3 vs. 1 with Keeper (game score drops from 0.6 to 0.1). As training from shaped rewards doesn't scale to more realistic tasks, the community needs to improve the success of training with terminal rewards. For this we propose GridToPix: 1) train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds. Despite learning from only terminal rewards with identical models and RL algorithms, GridToPix significantly improves results across tasks: from PointGoal Navigation (SPL improves from 0 to 64) and Furniture Moving (success improves from 1% to 25%) to football gameplay (game score improves from 0.1 to 0.6). GridToPix even helps to improve the results of shaped reward training.

agent, reward structure, terminal reward, (14 more...)

2105.00931

Country: North America > United States > Illinois (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Discover the Hidden Attack Path in Multi-domain Cyberspace Based on Reinforcement Learning

Zhang, Lei, Bai, Wei, Li, Wei, Xia, Shiming, Zheng, Qibin

In this work, we present a learning-based approach to analysis cyberspace security configuration. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of agents as attackers, our method becomes better at discovering hidden attack paths for previously methods, especially in multi-domain cyberspace. To achieve these results, we pose discovering attack paths as a Reinforcement Learning (RL) problem and train an agent to discover multi-domain cyberspace attack paths. To enable our RL policy to discover more hidden attack paths and shorter attack paths, we ground representation introduction an multi-domain action select module in RL. Our objective is to discover more hidden attack paths and shorter attack paths by our proposed method, to analysis the weakness of cyberspace security configuration. At last, we designed a simulated cyberspace experimental environment to verify our proposed method, the experimental results show that our method can discover more hidden multi-domain attack paths and shorter attack paths than existing baseline methods.

attacker, cyberspace, information, (16 more...)

2104.07195

Country:

North America > United States > California > Orange County > Anaheim (0.04)
Europe > Spain (0.04)
Asia > South Korea > Busan > Busan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

An Introduction of mini-AlphaStar

Liu, Ruo-Ze, Wang, Wenhai, Shen, Yanjie, Li, Zhiqi, Yu, Yang, Lu, Tong

StarCraft II (SC2) is a real-time strategy game, in which players produce and control multiple units to win. Due to its difficulties, such as huge state space, various action space, a long time horizon, and imperfect information, SC2 has been a research highlight in reinforcement learning research. Recently, an SC2 agent called AlphaStar is proposed which shows excellent performance, obtaining a high win-rates of 99.8% against Grandmaster level human players. We implemented a mini-scaled version of it called mini-AlphaStar based on their paper and the pseudocode they provided. The usage and analysis of it are shown in this technical report. The difference between AlphaStar and mini-AlphaStar is that we substituted the hyper-parameters in the former version with much smaller ones for mini-scale training. The codes of mini-AlphaStar are all open-sourced. The objective of mini-AlphaStar is to provide a reproduction of the original AlphaStar and facilitate the future research of RL on large-scale problems.

agent, encoder, linear layer, (16 more...)

2104.0689

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Reward function shape exploration in adversarial imitation learning: an empirical study

Wang, Yawei, Li, Xiu

For adversarial imitation learning algorithms (AILs), no true rewards are obtained from the environment for learning the strategy. However, the pseudo rewards based on the output of the discriminator are still required. Given the implicit reward bias problem in AILs, we design several representative reward function shapes and compare their performances by large-scale experiments. To ensure our results' reliability, we conduct the experiments on a series of Mujoco and Box2D continuous control tasks based on four different AILs. Besides, we also compare the performance of various reward function shapes using varying numbers of expert trajectories. The empirical results reveal that the positive logarithmic reward function works well in typical continuous control tasks. In contrast, the so-called unbiased reward function is limited to specific kinds of tasks. Furthermore, several designed reward functions perform excellently in these environments as well.

discriminator, reward function, trajectory, (14 more...)

2104.06687

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Shah, Dhruv, Eysenbach, Benjamin, Rhinehart, Nicholas, Levine, Sergey

RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

We describe a robotic learning system for autonomous navigation in diverse environments. At the core of our method are two components: (i) a non-parametric map that reflects the connectivity of the environment but does not require geometric reconstruction or localization, and (ii) a latent variable model of distances and actions that enables efficiently constructing and traversing this map. The model is trained on a large dataset of prior experience to predict the expected amount of time and next action needed to transit between the current image and a goal image. Training the model in this way enables it to develop a representation of goals robust to distracting information in the input images, which aids in deploying the system to quickly explore new environments. We demonstrate our method on a mobile ground robot in a range of outdoor navigation scenarios. Our method can learn to reach new goals, specified as images, in a radius of up to 80 meters in just 20 minutes, and reliably revisit these goals in changing environments. We also demonstrate our method's robustness to previously-unseen obstacles and variable weather conditions. We encourage the reader to visit the project website for videos of our experiments and demonstrations https://sites.google.com/view/recon-robot

exploration, new environment, representation, (11 more...)

2104.05859

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

#artificialintelligenceApr-13-2021, 20:36:28 GMT

Artificial Intelligence: Reinforcement Learning in Python

Free Coupon Discount - Artificial Intelligence: Reinforcement Learning in Python, Complete guide to Artificial Intelligence, prep for Deep Reinforcement Learning with Stock Trading Applications BESTSELLER 4.6 (5,404 ratings) Created by Lazy Programmer Inc. English [Auto-generated], Portuguese [Auto-generated], 1 more Preview this Udemy Course - GET COUPON CODE 100% Off Udemy Coupon . Free Udemy Courses . Online Classes

artificial intelligence, learning, machine learning, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceApr-13-2021, 20:35:24 GMT

Portfolio Optimization using Reinforcement Learning

Reinforcement learning is arguably the coolest branch of artificial intelligence. It has already proven its prowess: stunning the world, beating the world champions in games of Chess, Go, and even DotA 2. Using RL for stock trading has always been a holy grail among data scientists. Stock trading has drawn our imaginations because of its ease of access and to misquote Cardi B, we like diamond and we like dollars . There are several ways of using Machine Learning for stock trading. One approach is to use forecasting techniques to predict the movement of the stock and build some heuristic based bot that uses the prediction to make decisions.

chess, portfolio, upstream oil & gas, (20 more...)

#artificialintelligence

Industry:

Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games > Chess (0.57)
Energy > Oil & Gas > Upstream (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)