Goto

Collaborating Authors

 Reinforcement Learning


Back to the core of intelligence … to really move to the future

#artificialintelligence

Two decades ago I started working on metrics of machine intelligence. By that time, during the glacial days of the second AI winter, few were really interested in measuring something that AI lacked completely. And very few, such as David L. Dowe and I, were interested in metrics of intelligence linked to algorithmic information theory, where the models of interaction between an agent and the world were sequences of bits, and intelligence was formulated using Solomonoff's and Wallace's theories of inductive inference. In the meantime, seemingly dozens of variants of the Turing test were proposed every year, the CAPTCHAs were introduced and David showed how easy it is to solve some IQ tests using a very simple program based on a big-switch approach. And, today, a new AI spring has arrived, triggered by a blossoming machine learning field, bringing a more experimental approach to AI with an increasing number of AI benchmarks and competitions (see a previous entry in this blog for a survey).


ACtuAL: Actor-Critic Under Adversarial Learning

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a two-player minimax problem, GANs are typically trained end-to-end on real-valued data and can be used to train a generator of high-dimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via back-propagation. This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actor-critic framework with a Temporal Difference (TD) objective. This is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard Teacher-Forcing methods.


Safe Model-based Reinforcement Learning with Stability Guarantees

arXiv.org Machine Learning

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.


Berkeley startup to train robots like puppets

@machinelearnbot

Robots today must be programmed by writing computer code, but imagine donning a VR headset and virtually guiding a robot through a task, like you would move the arms of a puppet, and then letting the robot take it from there. That's the vision of Pieter Abbeel, a professor of electrical engineering and computer science at the University of California, Berkeley, and his students, Peter Chen, Rocky Duan and Tianhao Zhang, who have launched a startup, Embodied Intelligence Inc., to use the latest techniques of deep reinforcement learning and artificial intelligence to make industrial robots easily teachable. "Right now, if you want to set up a robot, you program that robot to do what you want it to do, which takes a lot of time and a lot of expertise," said Abbeel, who is currently on leave to turn his vision into reality. "With our advances in machine learning, we can write a piece of software once -- machine learning code that enables the robot to learn -- and then when the robot needs to be equipped with a new skill, we simply provide new data." The "data" is training, much like you'd train a human worker, though with the added dimension of virtual reality.


1107_release

#artificialintelligence

Building on the founders' pioneering research in deep imitation learning, deep reinforcement learning and meta-learning, Embodied Intelligence is developing AI software (aka robot brains) that can be loaded onto any existing robots. While traditional programming of robots requires writing code, a time-consuming endeavor even for robotics experts, Embodied Intelligence software will empower anyone to program a robot by simply donning a VR headset and guiding a robot through a task. These human demonstrations train deep neural nets, which are further tuned through the use of reinforcement learning, resulting in robots that can be easily taught a wide range of skills in areas where existing solutions break down. Complicated tasks like the manipulation of deformable objects such as wires, fabrics, linens, apparel, fluid-bags, and food; picking parts and order items out of cluttered, unstructured bins; completing assemblies where hard automation struggles due to variability in parts, configurations, and individualization of orders, are all candidates to benefit from Embodied Intelligence's work.


Deep reinforcement learning: where to start – freeCodeCamp

#artificialintelligence

More than 200 million people watched as reinforcement learning (RL) took to the world stage. A few years earlier, DeepMind had made waves with a bot that could play Atari games. The company was soon acquired by Google. Many researchers believe that RL is our best shot at creating artificial general intelligence. It is an exciting field, with many unsolved challenges and huge potential.


[D] What do you feel is currently undervalued / underappreciated in the field of machine learning? • r/MachineLearning

@machinelearnbot

Good reinforcement learning and other'reasoning' benchmarks to measure progress, some set of increasingly harder tasks that can measurably show the different strengths of various models. My thoughts are that it wasn't just the data, but everything around image-net that really pushed the field forward, the yearly competition, the talks and progress graphs the anticipation and excitement to see how far the teams pushed the limit this time. Reinforcement learning still needs its'image-net moment', ideally some annual competition that can gain traction over time, have the big teams invest resource to push the limits. The field lends itself well to simply adding more complex tasks as the models get stronger and stronger. I merely answered this question as in'what would I as an outsider like to see', so feel free to disregard', but I think there is something in the human nature about competition which drives progress.


Learning in Brains and Machines (3): Synergistic and Modular Action

@machinelearnbot

Action synergies in the brain are paralleled by macro-actions and options in machines. In both brains and machines, these tools enable fast learning, strong generalisation and flexible action. But most importantly, for we who seek a deeper understanding of the brain, and human and machine intelligence, this has illuminated the principles of modularity an abstraction--an invaluable principle of biological and computational learning.


Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

arXiv.org Machine Learning

Speech recognition systems have achieved high recognition performance for several tasks. However, the performance of such systems is dependent on the tremendously costly development work of preparing vast amounts of task-matched transcribed speech data for supervised training. The key problem here is the cost of transcribing speech data. The cost is repeatedly required to support new languages and new tasks. Assuming broad network services for transcribing speech data for many users, a system would become more self-sufficient and more useful if it possessed the ability to learn from very light feedback from the users without annoying them. In this paper, we propose a general reinforcement learning framework for speech recognition systems based on the policy gradient method. As a particular instance of the framework, we also propose a hypothesis selection-based reinforcement learning method. The proposed framework provides a new view for several existing training and adaptation methods. The experimental results show that the proposed method improves the recognition performance compared to unsupervised adaptation.


AI Startup Embodied Intelligence Wants Robots to Learn From Humans in Virtual Reality

IEEE Spectrum Robotics

We are building technology that enables existing robot hardware to handle a much wider range of tasks where existing solutions break down, for example, bin picking of complex shapes, kitting, assembly, depalletizing of irregular stacks, and manipulation of deformable objects such as wires, cables, fabrics, linens, fluid-bags, and food. To equip existing robots with these skills, our software builds on the latest advances in deep reinforcement learning, deep imitation learning, and few-shot learning, to all of which the founding team has made significant contributions. The result isn't just a new set of skills in the robot repertoire, but teachable robots, that can be deployed for new tasks on short turn-around. The background here will be familiar to anyone who has followed Abbeel's research at UC Berkeley's Robot Learning Lab (RLL).