AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning – Reward for Learning

#artificialintelligenceApr-29-2018, 21:31:54 GMT

RL compared with a scenario like "how some new born baby animals learns to stand, run, and survive in the given environment." Reinforcement Learning (RL) is more general than supervised learning or unsupervised learning. It learn from interaction with environment to achieve a goal or simply learns from reward and punishments. In other words algorithms learns to react to the environment. TD-learning seems to be closest to how humans learn in this type of situation, but Q-learning and others also have their own advantages.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Dai, Zihang, Xie, Qizhe, Hovy, Eduard

arXiv.org Machine LearningApr-29-2018

In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1804.10974

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Crawling in Rogue's dungeons with (partitioned) A3C

Asperti, Andrea, Cortesi, Daniele, Sovrano, Francesco

arXiv.org Machine LearningApr-29-2018

Rogue is a famous dungeon-crawling video-game of the 80ies, the ancestor of its gender. Rogue-like games are known for the necessity to explore partially observable and always different randomly-generated labyrinths, preventing any form of level replay. As such, they serve as a very natural and challenging task for reinforcement learning, requiring the acquisition of complex, non-reactive behaviors involving memory and planning. In this article we show how, exploiting a version of A3C partitioned on different situations, the agent is able to reach the stairs and descend to the next level in 98% of cases.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1804.08685

Country: North America > Canada (0.68)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Safety-Aware Apprenticeship Learning

Zhou, Weichao, Li, Wenchao

arXiv.org Artificial IntelligenceApr-28-2018

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1710.07983

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

TDM: From Model-Free to Model-Based Deep Reinforcement Learning

#artificialintelligenceApr-27-2018, 19:47:17 GMT

You've decided that you want to bike from your house by UC Berkeley to the Golden Gate Bridge. To make matters worse, you are new to the Bay Area, and all you have is a good ol' fashion map to guide you. How do you get started? Let's first figure out how to ride a bike. One strategy would be to do a lot of studying and planning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country: Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Introduction to Learning to Trade with Reinforcement Learning

@machinelearnbotApr-27-2018, 19:40:56 GMT

The academic Deep Learning research community has largely stayed away from the financial markets. Maybe that's because the finance industry has a bad reputation, the problem doesn't seem interesting from a research perspective, or because data is difficult and expensive to obtain. In this post, I'm going to argue that training Reinforcement Learning agents to trade in the financial (and cryptocurrency) markets can be an extremely interesting research problem. I believe that it has not received enough attention from the research community but has the potential to push the state-of-the art of many related fields. It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field. This is not a "price prediction using Deep Learning" post. So, if you're looking for example code and models you may be disappointed. Instead, I want to talk on a more high level about why learning to trade using Machine Learning is difficult, what some of the challenges are, and where I think Reinforcement Learning fits in. If there's enough interest in this area I may follow up with another post that includes concrete examples. I expect most readers to have no background in trading, just like I didn't, so I will start out with covering some of the basics.

machine learning, order book, reinforcement learning, (19 more...)

@machinelearnbot

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Microsoft/AirSim

#artificialintelligenceApr-27-2018, 02:31:04 GMT

AirSim is a simulator for drones, cars and more built on Unreal Engine. It is open-source, cross platform and supports hardware-in-loop with popular flight controllers such as PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped in to any Unreal environment you want. Our goal is to develop AirSim as a platform for AI research to experiment with deep learning, computer vision and reinforcement learning algorithms for autonomous vehicles. For this purpose, AirSim also exposes APIs to retrieve data and control vehicles in a platform independent way.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

A Hybrid Q-Learning Sine-Cosine-based Strategy for Addressing the Combinatorial Test Suite Minimization Problem

Zamli, Kamal Z., Din, Fakhrud, Ahmed, Bestoun S., Bures, Miroslav

arXiv.org Artificial IntelligenceApr-27-2018

The sine-cosine algorithm (SCA) is a new population-based meta-heuristic algorithm. In addition to exploiting sine and cosine functions to perform local and global searches (hence the name sine-cosine), the SCA introduces several random and adaptive parameters to facilitate the search process. Although it shows promising results, the search process of the SCA is vulnerable to local minima/maxima due to the adoption of a fixed switch probability and the bounded magnitude of the sine and cosine functions (from -1 to 1). In this paper, we propose a new hybrid Q-learning sine-cosine- based strategy, called the Q-learning sine-cosine algorithm (QLSCA). Within the QLSCA, we eliminate the switching probability. Instead, we rely on the Q-learning algorithm (based on the penalty and reward mechanism) to dynamically identify the best operation during runtime. Additionally, we integrate two new operations (L\'evy flight motion and crossover) into the QLSCA to facilitate jumping out of local minima/maxima and enhance the solution diversity. To assess its performance, we adopt the QLSCA for the combinatorial test suite minimization problem. Experimental results reveal that the QLSCA is statistically superior with regard to test suite size reduction compared to recent state-of-the-art strategies, including the original SCA, the particle swarm test generator (PSTG), adaptive particle swarm optimization (APSO) and the cuckoo search strategy (CS) at the 95% confidence level. However, concerning the comparison with discrete particle swarm optimization (DPSO), there is no significant difference in performance at the 95% confidence level. On a positive note, the QLSCA statistically outperforms the DPSO in certain configurations at the 90% confidence level.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0195675

1805.00873

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.93)

Add feedback

Decoupling Dynamics and Reward for Transfer Learning

Zhang, Amy, Satija, Harsh, Pineau, Joelle

arXiv.org Machine LearningApr-27-2018

Current reinforcement learning (RL) methods can successfully learn single tasks but often generalize poorly to modest perturbations in task domain or training procedure. In this work, we present a decoupled learning strategy for RL that creates a shared representation space where knowledge can be robustly transferred. We separate learning the task representation, the forward dynamics, the inverse dynamics and the reward function of the domain, and show that this decoupling improves performance within the task, transfers well to changes in dynamics and reward, and can be effectively used for online planning. Empirical results show good performance in both continuous and discrete RL domains.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1804.10689

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments

Chen, Xi, Ghadirzadeh, Ali, Folkesson, John, Jensfelt, Patric

arXiv.org Machine LearningApr-27-2018

Mobile robot navigation in complex and dynamic environments is a challenging but important problem. Reinforcement learning approaches fail to solve these tasks efficiently due to reward sparsities, temporal complexities and high-dimensionality of sensorimotor spaces which are inherent in such problems. We present a novel approach to train action policies to acquire navigation skills for wheel-legged robots using deep reinforcement learning. The policy maps height-map image observations to motor commands to navigate to a target position while avoiding obstacles. We propose to acquire the multifaceted navigation skill by learning and exploiting a number of manageable navigation behaviors. We also introduce a domain randomization technique to improve the versatility of the training samples. We demonstrate experimentally a significant improvement in terms of data-efficiency, success rate, robustness against irrelevant sensory data, and also the quality of the maneuver skills.

machine learning, reinforcement learning, trajectory, (13 more...)

arXiv.org Machine Learning

1804.105

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback