AITopics

1809.07731

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

Sun, Peng, Sun, Xinghai, Han, Lei, Xiong, Jiechao, Wang, Qing, Li, Bo, Zheng, Yang, Liu, Ji, Liu, Yongsheng, Liu, Han, Zhang, Tong

Starcraft II (SCII) is widely considered as the most challenging Real Time Strategy (RTS) game as of now, due to large observation space, huge (continuous and infinite) action space, partial observation, multi-player simultaneous game model, long time horizon decision, etc. To push the frontier of AI's capability, Deepmind and Blizzard jointly present the StarCraft II Learning Environment (SC2LE) --- a testbench for designing complex decision making systems. While SC2LE provides a few mini games such as \textit{MoveToBeacon}, \textit{CollectMineralShards}, and \textit{DefeatRoaches} where some AI agents achieve the professional player's level, it is still far away from achieving the professional level in a \emph{full} game. To initialize the research and investigation in the full game, we develop two AI agents --- the AI agent TStarBot1 is based on deep reinforcement learning over flat action structure, and the AI agent TStarBot2 is based on rule controller over hierarchical action structure. Both TStarBot1 and TStarBot2 are able to defeat the builtin AI agents from level 1 to level 10 in a full game (1v1 \textit{Zerg}-vs-\textit{Zerg} game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with full vision on the whole map, with resource harvest boosting, and with both, respectively \footnote{According to some informal discussions from the StarCraft II forum, level 10 builtin AI is estimated to be Platinum to Diamond~\cite{scii-forum}, which are equivalent to top 50\% - 30\% human players in the ranking system of Battle.net Leagues~\cite{liquid}. }.

machine learning, macro action, reinforcement learning, (17 more...)

1809.07193

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

De Asis, Kristopher, Bennett, Brendan, Sutton, Richard S.

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of TD learning is that it is learning predictive knowledge about the environment in the form of value functions, from which it can derive its behavior to address long-term sequential decision making problems. The agent's horizon of interest, that is, how immediate or long-term a TD learning agent predicts into the future, is adjusted through a discount rate parameter. In this paper, we introduce an alternative view on the discount rate, with insight from digital signal processing, to include complex-valued discounting. Our results show that setting the discount rate to appropriately chosen complex numbers allows for online and incremental estimation of the Discrete Fourier Transform (DFT) of a signal of interest with TD learning. We thereby extend the types of knowledge representable by value functions, which we show are particularly useful for identifying periodic effects in the reward sequence.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

1809.07435

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sunder, Vishal, Vig, Lovekesh, Chatterjee, Arnab, Shroff, Gautam

Prosocial or Selfish? Agents with different behaviors for Contract Negotiation using Reinforcement Learning

We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial behavior to varying degrees in these agents. Empirical evidence is provided showing consistency in agent behaviors. We further train a meta agent with a mixture of behaviors by learning an ensemble of different models using reinforcement learning. Finally, to ascertain the deployability of the negotiating agents, we conducted experiments pitting the trained agents against human players. Results demonstrate that the agents are able to hold their own against human players, often emerging as winners in the negotiation. Our experiments demonstrate that the meta agent is able to reasonably emulate human behavior.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1809.07066

Country: Asia > India > NCT (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Law > Contract Law (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nagarajan, Prabhat, Warnell, Garrett, Stone, Peter

Deterministic Implementations for Reproducibility in Deep Reinforcement Learning

While deep reinforcement learning (DRL) has led to numerous successes in recent years, reproducing these successes can be extremely challenging. One reproducibility challenge particularly relevant to DRL is nondeterminism in the training process, which can substantially affect the results. Motivated by this challenge, we study the positive impacts of deterministic implementations in eliminating nondeterminism in training. To do so, we consider the particular case of the deep Q-learning algorithm, for which we produce a deterministic implementation by identifying and controlling all sources of nondeterminism in the training process. One by one, we then allow individual sources of nondeterminism to affect our otherwise deterministic implementation, and measure the impact of each source on the variance in performance. We find that individual sources of nondeterminism can substantially impact the performance of agent, illustrating the benefits of deterministic implementations. In addition, we also discuss the important role of deterministic implementations in achieving exact replicability of results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1809.05676

Country:

Asia > Japan (0.28)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

#artificialintelligenceSep-18-2018, 02:57:45 GMT

Reinforcement Learning Series Intro - Syllabus Overview

Welcome to this series on reinforcement learning! We'll first start out by introducing the absolute basics to build a solid ground for us to run. We'll then progress onto more advanced and sophisticated topics that integrate artificial neural networks and deep learning into reinforcement learning. We'll also be getting our hands dirty by implementing some super cool reinforcement learning projects in code! Without further ado, let's get to it!

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Industry:

Information Technology (0.44)
Education > Curriculum (0.40)
Banking & Finance > Trading (0.37)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Dornheim, Johannes, Link, Norbert, Gumbsch, Peter

Model-Free Adaptive Optimal Control of Sequential Manufacturing Processes using Reinforcement Learning

arXiv.org Artificial IntelligenceSep-18-2018

A self-learning optimal control algorithm for sequential manufacturing processes with time-discrete control actions is proposed and evaluated with simulated deep drawing processes. The necessary control model is built during consecutive process executions under optimal control via Reinforcement Learning, using the measured product quality as reward after each process execution. Prior model formation, which is required by state-of-the-art algorithms like Model Predictive Control and Approximate Dynamic Programming, is therefore obsolete. This avoids the difficulties in system identification and accurate modelling, which arise with processes subject to non-linear dynamics and stochastic influences. Also runtime complexity problems of these approaches are avoided, which arise when more complex models and larger control prediction horizons are employed. Instead of using pre-created process- and observation-models, Reinforcement Learning algorithms build functions of expected future reward during processing, which are then used for optimal process control decisions. The learning of such expectation functions is realized online by interacting with the process. The proposed algorithm also takes stochastic variations of the process conditions into consideration and is able to cope with partial observability. A method for the adaptive optimal control of partially observable fixed-horizon manufacturing processes, based on Q-learning is developed and studied. The resulting algorithm is instantiated and then evaluated by application to a time-stochastic optimal control problem in metal sheet deep drawing, where the experiments use FEM-simulated processes. The Reinforcement Learning based control shows superior results over the model-based Model Predictive Control and Approximate Dynamic Programming approaches.

deep learning, optimal control, upstream oil & gas, (18 more...)

1809.06646

Country: Europe > Germany (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Brown, Alexander, Petrik, Marek

Interpretable Reinforcement Learning with Ensemble Methods

arXiv.org Machine LearningSep-18-2018

Reinforcement learning continues to break bounds on what we even thought possible, recently with AlphaGo's triumph over leading Go player Lee Sedol and with the further successes of AlphaGoZero, which surpassed AlphaGo learning only from self-play [14]. While the performance of such systems is impressive and very useful, sometimes it is desirable to understand and interpret the actions of a reinforcement learning system, and machine learning systems in general. These circumstances are more common in high-pressure applications, such as healthcare, targeted advertising, or finance [6]. For example, researchers at the University of Pittsburgh Medical Center trained a variety of machine learning models including neural networks and decision trees to predict whether pneumonia patients might develop severe complications. The neural networks performed the best on their testing data, but upon examination of the rules of the decision trees, the researchers found that the trees recommended sending pneumonia patients who had asthma directly home, despite the fact that asthma makes patients with pneumonia much more likely to suffer complications.

machine learning, reinforcement, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1809.06995

Country:

North America > United States > New York (0.15)
North America > United States > New Jersey (0.14)

Genre: Research Report (0.84)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Go (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Merzic, Hamza, Bogdanovic, Miroslav, Kappler, Daniel, Righetti, Ludovic, Bohg, Jeannette

Leveraging Contact Forces for Learning to Grasp

arXiv.org Artificial IntelligenceSep-18-2018

Grasping objects under uncertainty remains an open problem in robotics research. This uncertainty is often due to noisy or partial observations of the object pose or shape. To enable a robot to react appropriately to unforeseen effects, it is crucial that it continuously takes sensor feedback into account. While visual feedback is important for inferring a grasp pose and reaching for an object, contact feedback offers valuable information during manipulation and grasp acquisition. In this paper, we use model-free deep reinforcement learning to synthesize control policies that exploit contact sensing to generate robust grasping under uncertainty. We demonstrate our approach on a multi-fingered hand that exhibits more complex finger coordination than the commonly used two-fingered grippers. We conduct extensive experiments in order to assess the performance of the learned policies, with and without contact sensing. While it is possible to learn grasping policies without contact sensing, our results suggest that contact feedback allows for a significant improvement of grasping robustness under object pose uncertainty and for objects with a complex shape.

contact feedback, machine learning, reinforcement learning, (14 more...)

1809.07004

Country:

North America > United States (0.47)
Europe (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceSep-18-2018

SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

Zhang, Chengwei, Li, Xiaohong, Hao, Jianye, Chen, Siqi, Tuyls, Karl, Feng, Zhiyong, Xue, Wanli, Chen, Rong

Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuousaction domains, multiagent coordination domains with continuous actions have received relatively few investigations. In this paper, we propose an independent learner hierarchical method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the cooperative problem with continuous actions into two layers. The first layer samples a finite set of actions from the continuous action spaces by a re-sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a reinforcement learning cooperative method. By constructing cooperative mechanisms at both levels, SCC-rFMQ can handle cooperative problems in continuous action cooperative Markov games effectively. The effectiveness of SCC-rFMQ is experimentally demonstrated on two well-designed games, i.e., a continuous version of the climbing game and a cooperative version of the boat problem. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms. A large number of multiagent coordination domains involve continuous action spaces, such as robot soccer [1] and multiplayer online battle arena game [2]. In such environments, agents not only need to coordinate with other agents towards desirable outcomes efficiently but also have to deal with infinitely large action spaces.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1809.06625

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.93)
Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)