AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

r/MachineLearning - [D] What deep learning papers should I implement to learn?

#artificialintelligenceJul-3-2018, 00:55:09 GMT

"A Neural Algorithm of Artistic Style" is very intuitive to understand and not terribly difficult to get going. Plus you don't need crazy hardware as you work with pre-trained models. "Human Level Control Through Deep Reinforcement Learning" is much more complicated, but very rewarding when you get it right as you can watch a machine learn to play your favorite childhood games. And, you'll get a strong grasp of your framework of choice, good debugging techniques, and how to effectively leverage training time on a back-end.

deep learning paper, machine learning, reinforcement learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Solving Atari Games Using Fractals And Entropy

Cerezo, Sergio Hernandez, Ballester, Guillem Duran, Baxevanakis, Spiros

arXiv.org Artificial IntelligenceJul-3-2018

In this paper we introduce a novel MCTS based approach that is derived from the laws of the thermodynamics. The algorithm, coined Fractal Monte Carlo (FMC), allows us to create an agent that takes intelligent actions in both continuous and discrete environments while providing control over every aspect of the agent's behavior. Results show that FMC is several orders of magnitude more efficient than similar techniques, such as MCTS, in the Atari games tested.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1807.01081

Country: Europe > Spain > Region of Murcia > Murcia (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Region Growing Curriculum Generation for Reinforcement Learning

Molchanov, Artem, Hausman, Karol, Birchfield, Stan, Sukhatme, Gaurav

arXiv.org Artificial IntelligenceJul-3-2018

Learning a policy capable of moving an agent between any two states in the environment is important for many robotics problems involving navigation and manipulation. Due to the sparsity of rewards in such tasks, applying reinforcement learning in these scenarios can be challenging. Common approaches for tackling this problem include reward engineering with auxiliary rewards, requiring domain-specific knowledge or changing the objective. In this work, we introduce a method based on region-growing that allows learning in an environment with any pair of initial and goal states. Our algorithm first learns how to move between nearby states and then increases the difficulty of the start-goal transitions as the agent's performance improves. This approach creates an efficient curriculum for learning the objective behavior of reaching any goal from any initial state. In addition, we describe a method to adaptively adjust expansion of the growing region that allows automatic adjustment of the key exploration hyperparameter to environments with different requirements. We evaluate our approach on a set of simulated navigation and manipulation tasks, where we demonstrate that our algorithm can efficiently learn a policy in the presence of sparse rewards.

machine learning, reinforcement learning, variance, (17 more...)

arXiv.org Artificial Intelligence

1807.01425

Country: North America > United States > California (0.14)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.54)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Playing against Nature: causal discovery for decision making under uncertainty

Gonzalez-Soto, M., Sucar, L. E., Escalante, H. J.

arXiv.org Artificial IntelligenceJul-3-2018

We consider decision problems under uncertainty where the options available to a decision maker and the resulting outcome are related through a causal mechanism which is unknown to the decision maker. We ask how a decision maker can learn about this causal mechanism through sequential decision making as well as using current causal knowledge inside each round in order to make better choices had she not considered causal knowledge and propose a decision making procedure in which an agent holds \textit{beliefs} about her environment which are used to make a choice and are updated using the observed outcome. As proof of concept, we present an implementation of this causal decision making model and apply it in a simple scenario. We show that the model achieves a performance similar to the classic Q-learning while it also acquires a causal model of the environment.

decision maker, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1807.01268

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > Mexico (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

Jaderberg, Max, Czarnecki, Wojciech M., Dunning, Iain, Marris, Luke, Lever, Guy, Castaneda, Antonio Garcia, Beattie, Charles, Rabinowitz, Neil C., Morcos, Ari S., Ruderman, Avraham, Sonnerat, Nicolas, Green, Tim, Deason, Louise, Leibo, Joel Z., Silver, David, Hassabis, Demis, Kavukcuoglu, Koray, Graepel, Thore

arXiv.org Machine LearningJul-3-2018

Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. In this work, we demonstrate for the first time that an agent can achieve human-level in a popular 3D multiplayer first-person video game, Quake III Arena Capture the Flag, using only pixels and game points as input. These results were achieved by a novel two-tier optimisation process in which a population of independent RL agents are trained concurrently from thousands of parallel matches with agents playing in teams together and against each other on randomly generated environments. Each agent in the population learns its own internal reward signal to complement the sparse delayed reward from winning, and selects actions using a novel temporally hierarchical representation that enables the agent to reason at multiple timescales. During game-play, these agents display human-like behaviours such as navigating, following, and defending based on a rich learned representation that is shown to encode high-level game knowledge. In an extensive tournament-style evaluation the trained agents exceeded the win-rate of strong human players both as teammates and opponents, and proved far stronger than existing state-of-the-art agents. These results demonstrate a significant jump in the capabilities of artificial agents, bringing us closer to the goal of human-level intelligence.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1807.01281

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Goal-Oriented Visual Dialog Agents via Advanced Recurrent Nets with Tempered Policy Gradient

Zhao, Rui, Tresp, Volker

arXiv.org Artificial IntelligenceJul-2-2018

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, training text-generating agents efficiently is still a considerable challenge. Commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). These methods encourage exploration with different temperature control strategies. We derive three variations of the TPGs and show their superior performance on a recently published AI-testbed, i.e., the GuessWhat?! game. On the testbed, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 9%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances. TPG can easily be applied to any goal-oriented dialogue systems.

machine learning, natural language, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1807.00737

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Speeding up the Metabolism in E-commerce by Reinforcement Mechanism Design

He, Hua-Lin, Pan, Chun-Xiang, Da, Qing, Zeng, An-Xiang

arXiv.org Machine LearningJul-1-2018

In a large E-commerce platform, all the participants compete for impressions under the allocation mechanism of the platform. Existing methods mainly focus on the short-term return based on the current observations instead of the long-term return. In this paper, we formally establish the lifecycle model for products, by defining the introduction, growth, maturity and decline stages and their transitions throughout the whole life period. Based on such model, we further propose a reinforcement learning based mechanism design framework for impression allocation, which incorporates the first principal component based permutation and the novel experiences generation method, to maximize short-term as well as long-term return of the platform. With the power of trial-and-error, it is possible to optimize impression allocation strategies globally which is contribute to the healthy development of participants and the platform itself. We evaluate our algorithm on a simulated environment built based on one of the largest E-commerce platforms, and a significant improvement has been achieved in comparison with the baseline solutions.

machine learning, platform, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1807.00448

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Services > e-Commerce Services (0.93)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control

Wang, JunPing, Zhang, WenSheng, Thomas, Ian, Duan, ShiHui, Shi, YouKang

arXiv.org Artificial IntelligenceJul-1-2018

Generating sequential decision process from huge amounts of measured process data is a future research direction for collaborative factory automation, making full use of those online or offline process data to directly design flexible make decisions policy, and evaluate performance. The key challenges for the sequential decision process is to online generate sequential decision-making policy directly, and transferring knowledge across tasks domain. Most multi-task policy generating algorithms often suffer from insufficient generating cross-task sharing structure at discrete-time nonlinear systems with applications. This paper proposes the multi-task generative adversarial nets with shared memory for cross-domain coordination control, which can generate sequential decision policy directly from raw sensory input of all of tasks, and online evaluate performance of system actions in discrete-time nonlinear systems. Experiments have been undertaken using a professional flexible manufacturing testbed deployed within a smart factory of Weichai Power in China. Results on three groups of discrete-time nonlinear control tasks show that our proposed model can availably improve the performance of task with the help of other related tasks.

decision support system, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1807.00298

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.47)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
(2 more...)

Add feedback

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

Wang, Baoxiang, Sun, Tongfang, Zheng, Sam Xianjun

arXiv.org Artificial IntelligenceJul-1-2018

In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players' diverse behaviors. In this paper, we present a novel method called Multi-Motivation Behavior Modeling (MMBM) that takes the multifaceted human motivations into consideration and models the underlying value structure of the players using inverse RL. Our approach does not require the access to the dynamic of the system, making it feasible to model complex interactive environments such as massively multiplayer online games. MMBM is tested on the World of Warcraft Avatar History dataset, which recorded over 70,000 users' gameplay spanning three years period. Our model reveals the significant difference of value structures among different player groups. Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players' behaviors.

machine learning, motivation, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1807.00366

Country: North America > United States > Texas (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Drive in a Day

Kendall, Alex, Hawke, Jeffrey, Janz, David, Mazur, Przemyslaw, Reda, Daniele, Allen, John-Mark, Lam, Vinh-Dieu, Bewley, Alex, Shah, Amar

arXiv.org Artificial IntelligenceJul-1-2018

We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

machine learning, reinforcement, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1807.00412

Country:

Europe > United Kingdom (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback