AITopics

Genre: Research Report (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

#artificialintelligenceDec-28-2019, 10:25:19 GMT

Ubisoft uses AI to teach a car to drive itself in a racing game

Reinforcement learning, an AI training technique that employs rewards to drive software policies toward goals, has been applied successfully to domains from industrial robotics to drug discovery. But while firms including OpenAI and Alphabet's DeepMind have investigated its efficacy in video games like Dota 2, Quake III Arena, and StarCraft 2, few to date have studied its use under constraints like those encountered in the game industry. That's presumably why Ubisoft La Forge, game developer Ubisoft's eponymous prototyping space, proposed in a recent paper an algorithm that's able to handle discrete, continuous video game actions in a "principled" and predictable way. They set it loose on a "commercial game" (likely The Crew or The Crew 2, though neither is explicitly mentioned) and report that it's competitive with state-of-the-art benchmark tasks. "Reinforcement Learning applications in video games have recently seen massive advances coming from the research community, with agents trained to play Atari games from pixels or to be competitive with the best players in the world in complicated imperfect information games," wrote the coauthors of a paper describing the work.

discrete action, ubisoft use ai, video game, (6 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.06)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

#artificialintelligenceDec-28-2019, 10:24:41 GMT

Reimagining Reinforcement Learning – Upside Down

Summary: For all the hype around winning game play and self-driving cars, traditional Reinforcement Learning (RL) has yet to deliver as a reliable tool for ML applications. Here we explore the main drawbacks as well as an innovative approach to RL that dramatically reduces the training compute requirement and time to train. Ever since Reinforcement Learning (RL) was recognized as a legitimate third style of machine learning alongside supervised and unsupervised learning we've been waiting for that killer app to prove its value. Yes RL has had some press-worthy wins in game play (Alpha Go), self-driving cars (not here yet), drone control, and even dialogue systems like personal assistants but the big breakthrough isn't here yet. RL ought to be our go-to solution for any problem requiring sequential decisions and these individual successes might make you think that RL is ready for prime time but the reality is that it's not.

application, reinforcement learning, upside, (12 more...)

Country: Europe > Switzerland (0.05)

Genre: Research Report (0.36)

Industry: Information Technology (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Loon, Keng Wah, Graesser, Laura, Cvitkovic, Milan

SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning

arXiv.org Artificial IntelligenceDec-28-2019

We introduce SLM Lab, a software framework for reproducible reinforcement learning (RL) research. SLM Lab implements a number of popular RL algorithms, provides synchronous and asynchronous parallel experiment execution, hyperparameter search, and result analysis. RL algorithms in SLM Lab are implemented in a modular way such that differences in algorithm performance can be confidently ascribed to differences between algorithms, not between implementations. In this work we present the design choices behind SLM Lab and use it to produce a comprehensive single-codebase RL algorithm benchmark. In addition, as a consequence of SLM Lab's modular design, we introduce and evaluate a discrete-action variant of the Soft Actor-Critic algorithm (Haarnoja et al., 2018) and a hybrid synchronous/asynchronous training method for RL agents.

algorithm, reinforcement, slm lab, (16 more...)

1912.12482

Country:

North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.66)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningDec-27-2019

Crowdfunding Dynamics Tracking: A Reinforcement Learning Approach

Wang, Jun, Zhang, Hefu, Liu, Qi, Pan, Zhen, Tao, Hanqing

Recent years have witnessed the increasing interests in research of crowdfunding mechanism. In this area, dynamics tracking is a significant issue but is still under exploration. Existing studies either fit the fluctuations of time-series or employ regularization terms to constrain learned tendencies. However, few of them take into account the inherent decision-making process between investors and crowdfunding dynamics. To address the problem, in this paper, we propose a Trajectory-based Continuous Control for Crowdfunding (TC3) algorithm to predict the funding progress in crowdfunding. Specifically, actor-critic frameworks are employed to model the relationship between investors and campaigns, where all of the investors are viewed as an agent that could interact with the environment derived from the real dynamics of campaigns. Then, to further explore the in-depth implications of patterns (i.e., typical characters) in funding series, we propose to subdivide them into $\textit{fast-growing}$ and $\textit{slow-growing}$ ones. Moreover, for the purpose of switching from different kinds of patterns, the actor component of TC3 is extended with a structure of options, which comes to the TC3-Options. Finally, extensive experiments on the Indiegogo dataset not only demonstrate the effectiveness of our methods, but also validate our assumption that the entire pattern learned by TC3-Options is indeed the U-shaped one.

funding progress, tc3-option, u-shaped pattern, (15 more...)

arXiv.org Machine Learning

1912.12016

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Anhui Province (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance (1.00)
Information Technology > Services > e-Commerce Services (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceDec-27-2019

Observational Overfitting in Reinforcement Learning

Song, Xingyou, Jiang, Yiding, Tu, Stephen, Du, Yilun, Neyshabur, Behnam

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).

generalization, international conference, regularization, (13 more...)

1912.02975

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Sweden > Stockholm > Stockholm (0.05)
(6 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceDec-26-2019, 15:23:30 GMT

Training Reinforcement Learning Agents to Ask the Right Questions

That paradigm assumes that the target knowledge is already embedded in the dataset and doesn't require any further clarifications but that rarely resembles how humans learn. When presented with a new subject, we are constantly forced to ask questions and clarifications about it. What if we could build the same skill into artificial intelligence(AI) models. The ability of formulate questions is a fundamental element of the human cognition process. The cornerstone of human's dialogs relies on our ability to express questions in a myriad of ways in order to obtain a specific answer.

answer selection model, google, training reinforcement learning agent, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Jha, Devesh, Raghunathan, Arvind, Romeres, Diego

Quasi-Newton Trust Region Policy Optimization

arXiv.org Artificial IntelligenceDec-26-2019

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1912.11912

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

arXiv.org Artificial IntelligenceDec-26-2019

A Survey of Deep Reinforcement Learning in Video Games

Shao, Kun, Tang, Zhentao, Zhu, Yuanheng, Li, Nannan, Zhao, Dongbin

Deep reinforcement learning (DRL) has made great achievements since proposed. Generally, DRL agents receive high-dimensional inputs at each step, and make actions according to deep-neural-network-based policies. This learning mechanism updates the policy to maximize the return with an end-to-end method. In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties. Besides, DRL plays an important role in game artificial intelligence (AI). We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from single-agent to multi-agent. A large number of video game AIs with DRL have achieved super-human performance, while there are still some challenges in this domain. Therefore, we also discuss some key points when applying DRL methods to this field, including exploration-exploitation, sample efficiency, generalization and transfer, multi-agent learning, imperfect information, and delayed spare rewards, as well as some research directions.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1912.10944

Country:

Europe > Sweden > Skåne County > Malmö (0.05)
Asia > China > Beijing > Beijing (0.04)

Genre:

Overview (1.00)
Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Osiński, Błażej, Jakubowski, Adam, Miłoś, Piotr, Zięcina, Paweł, Galias, Christopher, Homoceanu, Silviu, Michalewski, Henryk

Simulation-based reinforcement learning for real-world autonomous driving

arXiv.org Artificial IntelligenceDec-26-2019

We use synthetic data and a reinforcement learning algorithm to train a system controlling a full-size real-world vehicle in a number of restricted driving scenarios. The driving policy uses RGB images as input. We analyze how design decisions about perception, control and training impact the real-world performance.

experiment, international conference, simulation, (16 more...)

1911.12905

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
(20 more...)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (0.65)
Transportation > Ground > Road (0.51)
Information Technology > Robotics & Automation (0.51)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)