AITopics

arXiv.org Machine LearningMar-8-2019

Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning

Liu, Ziyu, Zhou, Meng, Cao, Weiqing, Qu, Qiang, Yeung, Henry Wing Fung, Chung, Vera Yuk Ying

The game of Chinese Checkers is a challenging traditional board game of perfect information that differs from other traditional games in two main aspects: first, unlike Chess, all checkers remain indefinitely in the game and hence the branching factor of the search tree does not decrease as the game progresses; second, unlike Go, there are also no upper bounds on the depth of the search tree since repetitions and backward movements are allowed. Therefore, even in a restricted game instance, the state-space of the game can still be unbounded, making it challenging for a computer program to excel. In this work, we present an approach that effectively combines the use of heuristics, Monte Carlo tree search, and deep reinforcement learning for building a Chinese Checkers agent without the use of any human game-play data. Experiment results show that our agent is competent under different scenarios and reaches the level of experienced human players.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1903.01747

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Leisure & Entertainment > Games > Chinese Checkers (0.81)
Leisure & Entertainment > Games > Checkers (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Akrout, Mohamed, Farahmand, Amir-massoud, Jarmain, Tory, Abid, Latif

Improving Skin Condition Classification with a Visual Symptom Checker trained using Reinforcement Learning

arXiv.org Artificial IntelligenceMar-8-2019

We present a visual symptom checker that combines a pre-trained Convolutional Neural Network (CNN) with a Reinforcement Learning (RL) agent as a Question Answering (QA) model. This method enables us to not only increase the classification confidence and accuracy of the visual symptom checker, but also decreases the average number of relevant questions asked to narrow down the differential diagnosis. By combining the CNN output in the form of classification probabilities as a part of the state structure of the simulated patient's environment, a DQN-based RL agent learns to ask the best symptom that maximizes its expected return over symptoms. We demonstrate that our RL approach increases the accuracy more than 20% as compared to the CNN alone, and up to 10% as compared to the decision tree model. We finally show that the RL approach not only outperforms the performance of the decision tree approach but also narrows down the diagonosis faster in terms of the average number of asked questions.

decision tree approach, machine learning, reinforcement learning, (16 more...)

1903.03495

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Dermatology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Kearney, Alex, Veeriah, Vivek, Travnik, Jaden, Pilarski, Patrick M., Sutton, Richard S.

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

There is a long history of using meta learning as representation learning, specifically for determining the relevance of inputs. In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks. In particular, we focus on stochastic meta-descent introduced in the Incremental Delta-Bar-Delta (IDBD) algorithm for setting individual step sizes for each feature of a linear function approximator. Using IDBD, a feature with large or small step sizes will have a large or small impact on generalization from training examples. As a main contribution of this work, we extend IDBD to temporal-difference (TD) learning---a form of learning which is effective in sequential, non i.i.d. problems. We derive a variety of IDBD generalizations for TD learning, demonstrating that they are able to distinguish which features are relevant and which are not. We demonstrate that TD IDBD is effective at learning feature relevance in both an idealized gridworld and a real-world robotic prediction task.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1903.03252

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre:

Workflow (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments

Young, Kenny, Tian, Tian

The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games are varied, showcase aspects of competency we expect from an intelligent agent, and are not biased towards any particular solution approach. The challenge of the ALE includes 1) the representation learning problem of extracting pertinent information from the raw pixels, and 2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, in reinforcement learning research, we care more about the latter, but the representation learning problem adds significant computational expense. In response, we introduce MinAtar, short for miniature Atari, a new evaluation platform that captures the general mechanics of specific Atari games, while simplifying certain aspects. In particular, we reduce the representational complexity to focus more on behavioural challenges. MinAtar consists of analogues to five Atari games which play out on a 10x10 grid. MinAtar provides a 10x10xn state representation. The n channels correspond to game-specific objects, such as ball, paddle and brick in the game Breakout. While significantly simplified, these domains are still rich enough to allow for interesting behaviours. To demonstrate the challenges posed by these domains, we evaluated a smaller version of the DQN architecture. We also tried variants of DQN without experience replay, and without a target network, to assess the impact of those two prominent components in the MinAtar environments. In addition, we evaluated a simpler agent that used actor-critic with eligibility traces, online updating, and no experience replay. We hope that by introducing a set of simplified, Atari-like games we can allow researchers to more efficiently investigate the unique behavioural challenges provided by the ALE.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1903.03176

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wu, Bohan, Akinola, Iretiayo, Allen, Peter K.

Pixel-Attentive Policy Gradient for Multi-Fingered Grasping in Cluttered Scenes

Recent advances in on-policy reinforcement learning (RL) methods enabled learning agents in virtual environments to master complex tasks with high-dimensional and continuous observation and action spaces. However, leveraging this family of algorithms in multi-fingered robotic grasping remains a challenge due to large sim-to-real fidelity gaps and the high sample complexity of on-policy RL algorithms. This work aims to bridge these gaps by first reinforcement-learning a multi-fingered robotic grasping policy in simulation that operates in the pixel space of the input: a single depth image. Using a mapping from pixel space to Cartesian space according to the depth map, this method transfers to the real world with high fidelity and introduces a novel attention mechanism that substantially improves grasp success rate in cluttered environments. Finally, the direct-generative nature of this method allows learning of multi-fingered grasps that have flexible end-effector positions, orientations and rotations, as well as all degrees of freedom of the hand.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1903.03227

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Saxena, Vaibhav, Sivanandan, Srinivasan, Mathur, Pulkit

Dyna-AIL : Adversarial Imitation Learning by Planning

Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence. In this paper, we propose an end-to-end differentiable adversarial imitation learning algorithm in a Dyna-like framework for switching between model-based planning and model-free learning from expert data. Our results on both discrete and continuous environments show that our approach of using model-based planning along with model-free learning converges to an optimal policy with fewer number of environment interactions in comparison to the state-of-the-art learning methods.

machine learning, reinforcement learning, trajectory, (16 more...)

1903.03234

Country:

North America > Canada > Ontario > Toronto (0.31)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Abramova, Ekaterina, Dickens, Luke, Kuhn, Daniel, Faisal, Aldo

RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems

arXiv.org Machine LearningMar-7-2019

Nonlinear optimal control problems are often solved with numerical methods that require knowledge of system's dynamics which may be difficult to infer, and that carry a large computational cost associated with iterative calculations. We present a novel neurobiologically inspired hierarchical learning framework, Reinforcement Learning Optimal Control, which operates on two levels of abstraction and utilises a reduced number of controllers to solve nonlinear systems with unknown dynamics in continuous state and action spaces. Our approach is inspired by research at two levels of abstraction: first, at the level of limb coordination human behaviour is explained by linear optimal feedback control theory. Second, in cognitive tasks involving learning symbolic level action selection, humans learn such problems using model-free and model-based reinforcement learning algorithms. We propose that combining these two levels of abstraction leads to a fast global solution of nonlinear control problems using reduced number of controllers. Our framework learns the local task dynamics from naive experience and forms locally optimal infinite horizon Linear Quadratic Regulators which produce continuous low-level control. A top-level reinforcement learner uses the controllers as actions and learns how to best combine them in state space while maximising a long-term reward. A single optimal control objective function drives high-level symbolic learning by providing training signals on desirability of each selected controller. We show that a small number of locally optimal linear controllers are able to solve global nonlinear control problems with unknown dynamics when combined with a reinforcement learner in this hierarchical framework. Our algorithm competes in terms of computational cost and solution quality with sophisticated control algorithms and we illustrate this with solutions to benchmark problems.

controller, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1903.03064

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceMar-6-2019, 08:47:38 GMT

Reinforcement Learning with Python – Towards Data Science

Recently we've been seeing computers playing games against humans, either as bots in multiplayer games or as opponents in one-on-one games like Dota2, PUB-G, Mario. Deepmind(a research company) made history when the news that their AlphaGo program defeated the South Korean Go world champion in 2016. If you're an intense gamer, probably you must have listened about Dota 2 OpenAI Five match, where machines played against humans and defeated world top Dota2 players in few matches (If you are interested about this, here is the complete analysis of the algorithm and the game played by the machine). So here's the central question, Why do we need reinforcement learning? Is it only used for games?

artificial intelligence, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Country: Asia > South Korea (0.25)

Industry:

Leisure & Entertainment > Games > Computer Games (0.75)
Leisure & Entertainment > Games > Go (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Parisotto, Emilio, Ghosh, Soham, Yalamanchi, Sai Bhargav, Chinnaobireddy, Varsha, Wu, Yuhuai, Salakhutdinov, Ruslan

Concurrent Meta Reinforcement Learning

arXiv.org Artificial IntelligenceMar-6-2019

State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task's state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1903.0271

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.93)
Media > Television (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)