AITopics

2004.0261

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Kanervisto, Anssi, Pussinen, Joonas, Hautamäki, Ville

Benchmarking End-to-End Behavioural Cloning on Video Games

arXiv.org Artificial IntelligenceApr-2-2020

Behavioural cloning, where a computer is taught to perform a task based on demonstrations, has been successfully applied to various video games and robotics tasks, with and without reinforcement learning. This also includes end-to-end approaches, where a computer plays a video game like humans do: by looking at the image displayed on the screen, and sending keystrokes to the game. As a general approach to playing video games, this has many inviting properties: no need for specialized modifications to the game, no lengthy training sessions and the ability to re-use the same tools across different games. However, related work includes game-specific engineering to achieve the results. We take a step towards a general approach and study the general applicability of behavioural cloning on twelve video games, including six modern video games (published after 2010), by using human demonstrations as training data. Our results show that these agents cannot match humans in raw performance but can learn human-like behaviour. We also demonstrate how the quality of the data matters, and how recording data from humans is subject to a state-action mismatch, due to human reflexes.

atari grand challenge, dataset, video game, (15 more...)

2004.00981

Country: Europe > Finland > North Karelia > Joensuu (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Kanervisto, Anssi, Scheller, Christian, Hautamäki, Ville

Action Space Shaping in Deep Reinforcement Learning

arXiv.org Artificial IntelligenceApr-2-2020

Reinforcement learning (RL) has been successful in training agents in various learning environments, including video-games. However, such work modifies and shrinks the action space from the game's original. This is to avoid trying "pointless" actions and to ease the implementation. Currently, this is mostly done based on intuition, with little systematic research supporting the design decisions. In this work, we aim to gain insight on these action space modifications by conducting extensive experiments in video-game environments. Our results show how domain-specific removal of actions and discretization of continuous actions can be crucial for successful learning. With these insights, we hope to ease the use of RL in new environments, by clarifying what action-spaces are easy to learn.

action space, agent, discrete, (12 more...)

2004.0098

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Constrained-Space Optimization and Reinforcement Learning for Complex Tasks

Tsai, Ya-Yen, Xiao, Bo, Johns, Edward, Yang, Guang-Zhong

Learning from Demonstration is increasingly used for transferring operator manipulation skills to robots. In practice, it is important to cater for limited data and imperfect human demonstrations, as well as underlying safety constraints. This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks. Through interactions within the constrained space, the reinforcement learning agent is trained to optimize the manipulation skills according to a defined reward function. After learning, the optimal policy is derived from the well-trained reinforcement learning agent, which is then implemented to guide the robot to conduct tasks that are similar to the experts' demonstrations. The effectiveness of the proposed method is verified with a robotic suturing task, demonstrating that the learned policy outperformed the experts' demonstrations in terms of the smoothness of the joint motion and end-effector trajectories, as well as the overall task completion time.

demonstration, robot, trajectory, (14 more...)

doi: 10.1109/LRA.2020.2965392

2004.00716

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Muller-Brockhausen, Matthias, Preuss, Mike, Plaat, Aske

A New Challenge: Approaching Tetris Link with AI

Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo tree search. We document our approach and report on their relative performance in a tournament. Curiously, the heuristic approach is stronger than the planning/learning approaches. However, experienced human players easily win the majority of the matches against the heuristic planning AIs. We, therefore, surmise that Tetris Link is more difficult than expected. We offer our findings to the community as a challenge to improve upon.

agent, mct, tetris link, (17 more...)

2004.00377

Country:

North America > Canada > Alberta (0.14)
Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningApr-1-2020

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Wang, Yu, Roohi, Nima, West, Matthew, Viswanathan, Mahesh, Dullerud, Geir E.

Probabilistic Computation Tree Logic (PCTL) is frequently used to formally specify control objectives such as probabilistic reachability and safety. In this work, we focus on model checking PCTL specifications statistically on Markov Decision Processes (MDPs) by sampling, e.g., checking whether there exists a feasible policy such that the probability of reaching certain goal states is greater than a threshold. We use reinforcement learning to search for such a feasible policy for PCTL specifications, and then develop a statistical model checking (SMC) method with provable guarantees on its error. Specifically, we first use upper-confidence-bound (UCB) based Q-learning to design an SMC algorithm for bounded-time PCTL specifications, and then extend this algorithm to unbounded-time specifications by identifying a proper truncation time by checking the PCTL specification and its negation at the same time. Finally, we evaluate the proposed method on case studies.

algorithm, probability, specification, (12 more...)

arXiv.org Machine Learning

2004.00273

Country:

North America > United States > Illinois (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)

Arakawa, Riku, Shiba, Shintaro

Exploration of Reinforcement Learning for Event Camera using Car-like Robots

We demonstrate the first reinforcement-learning application for robots equipped with an event camera. Because of the considerably lower latency of the event camera, it is possible to achieve much faster control of robots compared with the existing vision-based reinforcement-learning applications using standard cameras. To handle a stream of events for reinforcement learning, we introduced an image-like feature and demonstrated the feasibility of training an agent in a simulator for two tasks: fast collision avoidance and obstacle tracking. Finally, we set up a robot with an event camera in the real world and then transferred the agent trained in the simulator, resulting in successful fast avoidance of randomly thrown objects. Incorporating event camera into reinforcement learning opens new possibilities for various robotics applications that require swift control, such as autonomous vehicles and drones, through end-to-end learning approaches.

agent, event camera, reinforcement, (14 more...)

2004.00801

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Information Technology (0.46)
Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning

Pleines, Marco, Jitsev, Jenia, Preuss, Mike, Zimmer, Frank

The Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete. Whereas the top 6 performing entries of last year's competition all used human demonstrations to learn how to cope with the challenge, we present an approach that performed competitively (placed 7th) but starts completely from scratch by means of Deep Reinforcement Learning with a relatively simple feed-forward deep network structure. We especially look at the generalization performance of the taken approach concerning different seeds and various visual themes that have become available after the competition, and investigate where the agent fails and why. Note that our approach does not possess a short-term memory like employing recurrent hidden states. With this work, we hope to contribute to a better understanding of what is possible with a relatively simple, flexible solution that can be applied to learning in environments featuring complex 3D visual input where the abstract task structure itself is still fairly simple.

agent, generalization, reinforcement learning, (14 more...)

2004.00567

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

Zhu, Zhuangdi, Lin, Kaixiang, Dai, Bo, Zhou, Jiayu

Model-free deep reinforcement learning (RL) has demonstrated its superiority on many complex sequential decision-making problems. However, heavy dependence on dense rewards and high sample-complexity impedes the wide adoption of these methods in real-world scenarios. On the other hand, imitation learning (IL) learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive, and the quality of demonstrations typically limits the performance of the learning policy. In this work, we propose Self-Adaptive Imitation Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations for highly challenging sparse reward tasks. SAIL bridges the advantages of IL and RL to reduce the sample complexity substantially, by effectively exploiting sup-optimal demonstrations and efficiently exploring the environment to surpass the demonstrated performance. Extensive empirical results show that not only does SAIL significantly improve the sample-efficiency but also leads to much better final performance across different continuous control tasks, comparing to the state-of-the-art.

demonstration, exploration, objective, (14 more...)

2004.0053

Country:

North America > United States > Michigan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceMar-31-2020

Suphx: Mastering Mahjong with Deep Reinforcement Learning

Li, Junjie, Koyamada, Sotetsu, Ye, Qiwei, Liu, Guoqing, Wang, Chao, Yang, Ruihan, Zhao, Li, Qin, Tao, Liu, Tie-Yan, Hon, Hsiao-Wuen

Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold'em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.

agent, mahjong, suphx, (17 more...)

2003.1359

Country:

North America > United States > Texas (0.44)
Asia > China (0.04)
Europe (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)