AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Back to the core of intelligence … to really move to the future

#artificialintelligenceNov-13-2017, 17:55:04 GMT

Two decades ago I started working on metrics of machine intelligence. By that time, during the glacial days of the second AI winter, few were really interested in measuring something that AI lacked completely. And very few, such as David L. Dowe and I, were interested in metrics of intelligence linked to algorithmic information theory, where the models of interaction between an agent and the world were sequences of bits, and intelligence was formulated using Solomonoff's and Wallace's theories of inductive inference. In the meantime, seemingly dozens of variants of the Turing test were proposed every year, the CAPTCHAs were introduced and David showed how easy it is to solve some IQ tests using a very simple program based on a big-switch approach. And, today, a new AI spring has arrived, triggered by a blossoming machine learning field, bringing a more experimental approach to AI with an increasing number of AI benchmarks and competitions (see a previous entry in this blog for a survey).

artificial intelligence, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Country: Europe > Sweden > Skåne County > Malmö (0.06)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

ACtuAL: Actor-Critic Under Adversarial Learning

Goyal, Anirudh, Ke, Nan Rosemary, Lamb, Alex, Hjelm, R Devon, Pal, Chris, Pineau, Joelle, Bengio, Yoshua

arXiv.org Machine LearningNov-13-2017

Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a two-player minimax problem, GANs are typically trained end-to-end on real-valued data and can be used to train a generator of high-dimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via back-propagation. This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actor-critic framework with a Temporal Difference (TD) objective. This is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard Teacher-Forcing methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1711.04755

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Safe Model-based Reinforcement Learning with Stability Guarantees

Berkenkamp, Felix, Turchetta, Matteo, Schoellig, Angela P., Krause, Andreas

arXiv.org Machine LearningNov-13-2017

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1705.08551

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Berkeley startup to train robots like puppets

@machinelearnbotNov-12-2017, 16:50:09 GMT

Robots today must be programmed by writing computer code, but imagine donning a VR headset and virtually guiding a robot through a task, like you would move the arms of a puppet, and then letting the robot take it from there. That's the vision of Pieter Abbeel, a professor of electrical engineering and computer science at the University of California, Berkeley, and his students, Peter Chen, Rocky Duan and Tianhao Zhang, who have launched a startup, Embodied Intelligence Inc., to use the latest techniques of deep reinforcement learning and artificial intelligence to make industrial robots easily teachable. "Right now, if you want to set up a robot, you program that robot to do what you want it to do, which takes a lot of time and a lot of expertise," said Abbeel, who is currently on leave to turn his vision into reality. "With our advances in machine learning, we can write a piece of software once -- machine learning code that enables the robot to learn -- and then when the robot needs to be equipped with a new skill, we simply provide new data." The "data" is training, much like you'd train a human worker, though with the added dimension of virtual reality.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

@machinelearnbot

Country: North America > United States > California > Alameda County > Berkeley (0.25)

Industry:

Education (0.71)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

1107_release

#artificialintelligenceNov-12-2017, 14:35:22 GMT

Building on the founders' pioneering research in deep imitation learning, deep reinforcement learning and meta-learning, Embodied Intelligence is developing AI software (aka robot brains) that can be loaded onto any existing robots. While traditional programming of robots requires writing code, a time-consuming endeavor even for robotics experts, Embodied Intelligence software will empower anyone to program a robot by simply donning a VR headset and guiding a robot through a task. These human demonstrations train deep neural nets, which are further tuned through the use of reinforcement learning, resulting in robots that can be easily taught a wide range of skills in areas where existing solutions break down. Complicated tasks like the manipulation of deformable objects such as wires, fabrics, linens, apparel, fluid-bags, and food; picking parts and order items out of cluttered, unstructured bins; completing assemblies where hard automation struggles due to variability in parts, configurations, and individualization of orders, are all candidates to benefit from Embodied Intelligence's work.

machine learning, reinforcement learning, robot, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Deep reinforcement learning: where to start – freeCodeCamp

#artificialintelligenceNov-10-2017, 20:55:20 GMT

More than 200 million people watched as reinforcement learning (RL) took to the world stage. A few years earlier, DeepMind had made waves with a bot that could play Atari games. The company was soon acquired by Google. Many researchers believe that RL is our best shot at creating artificial general intelligence. It is an exciting field, with many unsolved challenges and huge potential.

machine learning, q-function, reinforcement learning, (17 more...)

#artificialintelligence

Country: Europe > Netherlands > South Holland > Rotterdam (0.05)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

[D] What do you feel is currently undervalued / underappreciated in the field of machine learning? • r/MachineLearning

@machinelearnbotNov-10-2017, 08:25:10 GMT

Good reinforcement learning and other'reasoning' benchmarks to measure progress, some set of increasingly harder tasks that can measurably show the different strengths of various models. My thoughts are that it wasn't just the data, but everything around image-net that really pushed the field forward, the yearly competition, the talks and progress graphs the anticipation and excitement to see how far the teams pushed the limit this time. Reinforcement learning still needs its'image-net moment', ideally some annual competition that can gain traction over time, have the big teams invest resource to push the limits. The field lends itself well to simply adding more complex tasks as the models get stronger and stronger. I merely answered this question as in'what would I as an outsider like to see', so feel free to disregard', but I think there is something in the human nature about competition which drives progress.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

Learning in Brains and Machines (3): Synergistic and Modular Action

@machinelearnbotNov-10-2017, 03:20:08 GMT

Action synergies in the brain are paralleled by macro-actions and options in machines. In both brains and machines, these tools enable fast learning, strong generalisation and flexible action. But most importantly, for we who seek a deeper understanding of the brain, and human and machine intelligence, this has illuminated the principles of modularity an abstraction--an invaluable principle of biological and computational learning.

machine learning, reinforcement learning, synergy, (14 more...)

@machinelearnbot

Industry: Health & Medicine > Therapeutic Area > Neurology (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Kato, Taku, Shinozaki, Takahiro

arXiv.org Machine LearningNov-9-2017

Speech recognition systems have achieved high recognition performance for several tasks. However, the performance of such systems is dependent on the tremendously costly development work of preparing vast amounts of task-matched transcribed speech data for supervised training. The key problem here is the cost of transcribing speech data. The cost is repeatedly required to support new languages and new tasks. Assuming broad network services for transcribing speech data for many users, a system would become more self-sufficient and more useful if it possessed the ability to learn from very light feedback from the users without annoying them. In this paper, we propose a general reinforcement learning framework for speech recognition systems based on the policy gradient method. As a particular instance of the framework, we also propose a hypothesis selection-based reinforcement learning method. The proposed framework provides a new view for several existing training and adaptation methods. The experimental results show that the proposed method improves the recognition performance compared to unsupervised adaptation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1711.03689

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

AI Startup Embodied Intelligence Wants Robots to Learn From Humans in Virtual Reality

IEEE Spectrum RoboticsNov-8-2017, 23:00:08 GMT

We are building technology that enables existing robot hardware to handle a much wider range of tasks where existing solutions break down, for example, bin picking of complex shapes, kitting, assembly, depalletizing of irregular stacks, and manipulation of deformable objects such as wires, cables, fabrics, linens, fluid-bags, and food. To equip existing robots with these skills, our software builds on the latest advances in deep reinforcement learning, deep imitation learning, and few-shot learning, to all of which the founding team has made significant contributions. The result isn't just a new set of skills in the robot repertoire, but teachable robots, that can be deployed for new tasks on short turn-around. The background here will be familiar to anyone who has followed Abbeel's research at UC Berkeley's Robot Learning Lab (RLL).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

IEEE Spectrum Robotics

AI-Alerts: 2017 > 2017-11 > AAAI AI-Alert for Nov 14, 2017 (1.00)

Industry: Education (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.41)

Add feedback