Goto

Collaborating Authors

 Reinforcement Learning


Hybrid Reinforcement Learning with Expert State Sequences

arXiv.org Artificial Intelligence

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.


The Promise of Hierarchical Reinforcement Learning

#artificialintelligence

This top-down planning approach decides what a good subgoal is before planning to achieve it." "For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is mis- specified whenever, the representation cannot express any policy with acceptable performance.


Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

arXiv.org Machine Learning

Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.


Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning

arXiv.org Artificial Intelligence

It is incredibly easy for a system designer to misspecify the objective for an autonomous system ("robot''), thus motivating the desire to have the robot learn the objective from human behavior instead. Recent work has suggested that people have an interest in the robot performing well, and will thus behave pedagogically, choosing actions that are informative to the robot. In turn, robots benefit from interpreting the behavior by accounting for this pedagogy. In this work, we focus on misspecification: we argue that robots might not know whether people are being pedagogic or literal and that it is important to ask which assumption is safer to make. We cast objective learning into the more general form of a common-payoff game between the robot and human, and prove that in any such game literal interpretation is more robust to misspecification. Experiments with human data support our theoretical results and point to the sensitivity of the pedagogic assumption.


Now any business can access the same type of AI that powered AlphaGo

#artificialintelligence

A startup called CogitAI has developed a platform that lets companies use reinforcement learning, the technique that gave AlphaGo mastery of the board game Go. Gaining experience: AlphaGo, an AI program developed by DeepMind, taught itself to play Go by practicing. It's practically impossible for a programmer to manually code in the best strategies for winning. Instead, reinforcement learning let the program figure out how to defeat the world's best human players on its own. Drug delivery: Reinforcement learning is still an experimental technology, but it is gaining a foothold in industry. Amazon recently launched a reinforcement-learning platform, but it is aimed more at researchers and academics.


This new AI tool ages faces in videos with creepy accuracy

#artificialintelligence

A new machine learning paper shows how AI can take footage of someone and duplicate the video with the subject looking an age the researchers specify. The team behind the paper, from the University of Arkansas, Clemson University, Carnegie Mellon University, and Concordia University in Canada, claim that this is one of the first methods to use AI to tackle aging in videos. The system was trained on an expanded dataset of photos of showing individuals at different ages. Reinforcement learning, a technique that rewards an AI model for getting a task correct, comes into play by rewarding the system when the synthesized features, like wrinkles, appear similarly across consecutive video frames. Similar approaches power the "deepfake" technology that has raised alarms about the prospect of AI-powered video propaganda.


AI Startup Invents Trick For Robots To More Efficiently Teach Themselves Complex Tasks

#artificialintelligence

Google-owned DeepMind uses sophisticated computer simulations for computers to teach themselves how to accomplish certain tasks. The simulated training, known as reinforcement learning, involves the computer trying out thousands (or millions) of different things until it manages to figure out what to do. Using this approach combined with deep learning, the London-based artificial intelligence research unit is teaching computers how to beat the world's best Go players and training robots how to move around in the world. A tiny Berkeley, California-based AI startup, Bonsai, has invented a trick to beat DeepMind in this game. The trick -- the company is calling it "concept networks" -- massively increases the efficiency of reinforcement learning.


Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

arXiv.org Artificial Intelligence

In standard reinforcement learning, each new skill requires a manually-designed reward function, which takes considerable manual effort and engineering. Self-supervised goal setting has the potential to automate this process, enabling an agent to propose its own goals and acquire skills that achieve these goals. However, such methods typically rely on manually-designed goal distributions, or heuristics to force the agent to explore a wide range of states. We propose a formal exploration objective for goal-reaching policies that maximizes state coverage. We show that this objective is equivalent to maximizing the entropy of the goal distribution together with goal reaching performance, where goals correspond to entire states. We present an algorithm called Skew-Fit for learning such a maximum-entropy goal distribution, and show that under certain regularity conditions, our method converges to a uniform distribution over the set of possible states, even when we do not know this set beforehand. Skew-Fit enables self-supervised agents to autonomously choose and practice diverse goals. Our experiments show that it can learn a variety of manipulation tasks from images, including opening a door with a real robot, entirely from scratch and without any manually-designed reward function.


Learning Self-Game-Play Agents for Combinatorial Optimization Problems

arXiv.org Artificial Intelligence

Recent progress in reinforcement learning (RL) using self-game-play has shown remarkable performance on several board games (e.g., Chess and Go) as well as video games (e.g., Atari games and Dota2). It is plausible to consider that RL, starting from zero knowledge, might be able to gradually approximate a winning strategy after a certain amount of training. In this paper, we explore neural Monte-Carlo-Tree-Search (neural MCTS), an RL algorithm which has been applied successfully by DeepMind to play Go and Chess at a super-human level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka's Game-Theoretical Semantics, we propose the Zermelo Gamification (ZG) to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problem. The ZG also provides a specially designed neural MCTS. We use a combinatorial planning problem for which the ground-truth policy is efficiently computable to demonstrate that ZG is promising.


Learning Heuristics over Large Graphs via Deep Reinforcement Learning

arXiv.org Artificial Intelligence

In this paper, we propose a deep reinforcement learning framework called GCOMB to learn algorithms that can solve combinatorial problems over large graphs. GCOMB mimics the greedy algorithm in the original problem and incrementally constructs a solution. The proposed framework utilizes Graph Convolutional Network (GCN) to generate node embeddings that predicts the potential nodes in the solution set from the entire node set. These embeddings enable an efficient training process to learn the greedy policy via Q-learning. Through extensive evaluation on several real and synthetic datasets containing up to a million nodes, we establish that GCOMB is up to 41% better than the state of the art, up to seven times faster than the greedy algorithm, robust and scalable to large dynamic networks.