AITopics

1902.04546

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Arumugam, Dilip, Lee, Jun Ki, Saskin, Sophie, Littman, Michael L.

Deep Reinforcement Learning from Policy-Dependent Human Feedback

arXiv.org Machine LearningFeb-12-2019

To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent's policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional observations, while also satisfying the constraint of obtaining reduced sample complexity. We demonstrate the effectiveness of our Deep COACH algorithm in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.

agent, algorithm, trainer, (11 more...)

1902.04257

Country:

Europe > Sweden > Skåne County > Malmö (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting > Online (0.68)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Machine LearningFeb-12-2019

Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN

Han, Dongqi, Doya, Kenji, Tani, Jun

Although recurrent neural networks (RNNs) for reinforcement learning (RL) have addressed unique advantages in various aspects, e. g., solving memory-dependent tasks and meta-learning, very few studies have demonstrated how RNNs can solve the problem of hierarchical RL by autonomously developing hierarchical control. In this paper, we propose a novel model-free RL framework called ReMASTER, which combines an off-policy actor-critic algorithm with a multiple timescale stochastic recurrent neural network for solving memory-dependent and hierarchical tasks. We performed experiments using a challenging continuous control task and showed that: (1) Internal representation necessary for achieving hierarchical control autonomously develops through exploratory learning. (2) Stochastic neurons in RNNs enable faster relearning when adapting to a new task which is a recomposition of sub-goals previously learned.

experiment, remaster, representation, (15 more...)

1901.10113

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.05)
Asia > Japan > Honshū > Kantō > Tochigi Prefecture > Utsunomiya (0.05)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Education (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

#artificialintelligenceFeb-11-2019, 21:11:56 GMT

Podcast #297: Reinforcement Learning with AWS DeepRacer Amazon Web Services

How are ML Models Trained? How can developers learn different approaches to solving business problems? Todd Escalona (Solutions Architect Evangelist, AWS) joins Simon to dive into reinforcement learning and AWS DeepRacer! The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives and interviews.

machine learning, podcast, reinforcement learning, (2 more...)

#artificialintelligence

Industry: Information Technology > Services (0.98)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

#artificialintelligenceFeb-11-2019, 04:42:38 GMT

Reinforcement Learning: Coming to a Home Called Yours!

I loved playing StarCraft, though I seldom played against other humans (my sons in particular, because they absolutely kick my butt). But ah, there is finally revenge for "Dad the Data Nerd", and it's known as AlphaStar. AlphaStar was developed by Google's DeepMind AI group to leverage artificial intelligence (AI) to master the game of StarCraft. StarCraft is much trickier for AI to master than games like Go and Mario Bros because of its unbounded complexity, continuously-changing gameplay (rather than the distinct events which occur when players take turns), evolving battlefield situations and dependency on constantly tweaking one's in-game strategy. I want to spend the rest of this blog doing a deep dive on Reinforcement Learning, because to me it is the trial-and-error nature of learning that places Reinforcement Learning squarely in the heart of future Artificial Intelligence aspirations.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.99)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ringstrom, Thomas J., Schrater, Paul R.

Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

arXiv.org Artificial IntelligenceFeb-11-2019

The detective will need to capture dependencies between sequential timeconstrained reason about the order in which these sub-goals are executed goal states because the state-space and may need to use knowledge of individual deadlines to must be prohibitively expanded to accommodate put constraints on the possible sub-goal sequences. For a history of successfully achieved sub-goals. Also, example, the detective knows that two key witnesses will policies and value functions derived with stationarity be leaving town for work in the morning and the two main assumptions are not readily decomposable, suspects will likely leave town later in the day. The detective leading to a tension between reward maximization will thus conclude that the witnesses must be questioned and task generalization. We demonstrate a logiccompatible first so that there is enough time and evidence to arrest and approach using model-based knowledge interrogate the suspects, as they cannot be held in custody of environment dynamics and deadline information for longer than a day. The order in which the two witnesses to directly infer non-stationary policies are questioned and the order in which the two suspects are composed of reusable stationary policies. The arrested does not matter for the satisfaction of the task which policies are constructed to maximize the probability only requires that all sub-goals are met before their individual of satisfying time-sensitive goals while respecting deadlines, leading to four distinct possible sequences of time-varying obstacles. Our approach explicitly sub-goals that can be executed. Furthermore, the difficulty maintains two different spaces, a high-level of this task is compounded by the fact that the detective must logical task specification where the task-variables have knowledge of the underlying movement constraints are grounded onto the low-level state-space of and knowledge of the dynamics of the environment.

constraint satisfaction propagation, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1901.10405

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Shah, Rohin, Krasheninnikov, Dmitrii, Alexander, Jordan, Abbeel, Pieter, Dragan, Anca

Preferences Implicit in the State of the World

Reinforcement learning (RL) agents optimize only the features specified in a reward function and are indifferent to anything left out inadvertently. This means that we must not only specify what to do, but also the much larger space of what not to do. It is easy to forget these preferences, since these preferences are already satisfied in our environment. This motivates our key insight: when a robot is deployed in an environment that humans act in, the state of the environment is already optimized for what humans want. We can therefore use this implicit preference information from the state to fill in the blanks. We develop an algorithm based on Maximum Causal Entropy IRL and use it to evaluate the idea in a suite of proof-of-concept environments designed to show its properties. We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized.

gradient, trajectory, vase, (15 more...)

1902.04198

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.91)

Kuang, Nikki Lijing, Leung, Clement H. C.

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as to whether it is worthwhile to carry on with the learning process or to terminate it. In many such situations, stochastic elements are often present which govern the occurrence of rewards, with the sequential occurrences of positive rewards randomly interleaved with negative rewards. For most practical learners, the learning is considered useful if the number of positive rewards always exceeds the negative ones. A situation that often calls for learning termination is when the number of negative rewards exceeds the number of positive rewards. However, while this seems reasonable, the error of premature termination, whereby termination is enacted along with the conclusion of learning failure despite the positive rewards eventually far outnumber the negative ones, can be significant. In this paper, using combinatorial analysis we study the error probability in wrongly terminating a reinforcement learning activity which undermines the effectiveness of an optimal policy, and we show that the resultant error can be quite high. Whilst we demonstrate mathematically that such errors can never be eliminated, we propose some practical mechanisms that can effectively reduce such errors. Simulation experiments have been carried out, the results of which are in close agreement with our theoretical findings.

negative reward, probability, termination error, (14 more...)

doi: 10.1109/AIKE.2018.00028

1902.04179

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kuang, Nikki Lijing, Leung, Clement H. C., Sung, Vienne W. K.

Stochastic Reinforcement Learning

In reinforcement learning episodes, the rewards and punishments are often non-deterministic, and there are invariably stochastic elements governing the underlying situation. Such stochastic elements are often numerous and cannot be known in advance, and they have a tendency to obscure the underlying rewards and punishments patterns. Indeed, if stochastic elements were absent, the same outcome would occur every time and the learning problems involved could be greatly simplified. In addition, in most practical situations, the cost of an observation to receive either a reward or punishment can be significant, and one would wish to arrive at the correct learning conclusion by incurring minimum cost. In this paper, we present a stochastic approach to reinforcement learning which explicitly models the variability present in the learning environment and the cost of observation. Criteria and rules for learning success are quantitatively analyzed, and probabilities of exceeding the observation cost bounds are also obtained.

negative reward, positive reward, probability, (13 more...)

doi: 10.1109/AIKE.2018.00055

1902.04178

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.50)

Industry: Education (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lee, Jaeyoung, Balakrishnan, Aravind, Gaurav, Ashish, Czarnecki, Krzysztof, Sedwards, Sean

WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving

Machine learning can provide efficient solutions to the complex problems encountered in autonomous driving, but ensuring their safety remains a challenge. A number of authors have attempted to address this issue, but there are few publicly-available tools to adequately explore the trade-offs between functionality, scalability, and safety. We thus present WiseMove, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving. WiseMove adopts a modular learning architecture that suits our current research questions and can be adapted to new technologies and new questions. We present the details of WiseMove, demonstrate its use on a common traffic scenario, and describe how we use it in our ongoing safe learning research.

architecture, high-level policy, wisemove, (13 more...)

1902.04118

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Automobiles & Trucks (0.92)
Transportation > Ground > Road (0.82)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)