Generative AI
Learning from Human Preferences
One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind's safety team, we've developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better. We present a learning algorithm that uses small amounts of human feedback to solve modern RL environments. Machine learning systems with human feedback have been explored before, but we've scaled up the approach to be able to work on much more complicated tasks. Our algorithm needed 900 bits of feedback from a human evaluator to learn to backflip -- a seemingly simple task which is simple to judge but challenging to specify.
OpenAI, DeepMind double team to make future AI machines safer
Researchers from OpenAI and DeepMind are hoping to make artificial intelligence safer using a new algorithm that learns from human feedback. Both companies are experts in reinforcement learning – an area of machine learning that rewards agents if they take the right actions to complete a task under a given environment. The goal is specified through an algorithm, and the agent is programmed to chase the reward, like winning points in a game. Reinforcement learning has been successful in teaching machines how to play games like Doom or Pong or drive autonomous cars via simulation. It's a powerful method to explore an agent's behavior, but it can be dangerous if the hard-coded algorithm is wrong or produces undesirable effects.
Learning to Cooperate, Compete, and Communicate
Multiagent environments where agents compete for resources are stepping stones on the path to AGI. Multiagent environments have two useful properties: first, there is a natural curriculum -- the difficulty of the environment is determined by the skill of your competitors (and if you're competing against clones of yourself, the environment exactly matches your skill level). Second, a multiagent environment has no stable equilibrium: no matter how smart an agent is, there's always pressure to get smarter. These environments have a very different feel from traditional environments, and it'll take a lot more research before we become good at them. We've developed a new algorithm, MADDPG, for centralized learning and decentralized execution in multiagent environments, allowing agents to learn to collaborate and compete with each other.
OpenAI's new approach for one-shot imitation learning, a peek into the future of AI
On May 16, OpenAI researchers shared a video of one of their projects along with two papers of importance exploring solutions to three key bottlenecks of current AI development: meta-learning, one-shot learning, and automated data generation. In my previous post, I promised an article dedicated to the fascinating problem of one-shot learning, so here goes. In this video you see a one-arm physical robot stacking cubes on top of each other. Knowing the complex tasks that industrial robots are currently able to perform, if the researcher was not trying to explain what is going on, on many accounts this would be very underwhelming. In controlled environment the task is simple, procedural (hard-coded) approaches have solved this problems already, what is promising and revolutionary is how much the general framework underneath could scale up to multiple, more complex and adaptive behaviors in noisier environments.
Elon Musk's OpenAI breaks new ground in AI research - IoT Agenda
At the core of the AI system are two different neural networks -- a vision network and an imitation network. These two work behind the scenes to provide the remarkable capability to imitate human actions, a giant step closer to building true AI systems. A robotic arm repeats the process of picking up blocks and stacking them in a particular configuration. It does this by witnessing just once a simulated demonstration performed by a human using a VR headset. Researchers have used thousands of simulated images to train the vision network.
Elon Musk's OpenAI breaks new ground in AI research
Elon Musk keeps surprising the world with his technological breakthroughs. OpenAI, a non-profit company focused on AI research, recently made an announcement regarding its groundbreaking AI invention. It has developed an AI system that can complete an actual physical task after watching just one demonstration of the task. At the core of the AI system are two different neural networks -- a vision network and an imitation network. These two work behind the scenes to provide the remarkable capability to imitate human actions, a giant step closer to building true AI systems.
[R] [1706.00550] On Unifying Deep Generative Models • r/MachineLearning
Deep generative models have achieved impressive success in recent years. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as powerful frameworks for deep generative model learning, have largely been considered as two distinct paradigms and received extensive independent study respectively. This paper establishes formal connections between deep generative modeling approaches through a new formulation of GANs and VAEs. We show that GANs and VAEs are essentially minimizing KL divergences with opposite directions and reversed latent/visible treatments, extending the two learning phases of classic wake-sleep algorithm, respectively. The unified view provides a powerful tool to analyze a diverse set of existing model variants, and enables to exchange ideas across research lines in a principled way.
Robots that Learn
Last month, we showed an earlier version of this robot where we'd trained its vision system using domain randomization, that is, by showing it simulated objects with a variety of color, backgrounds, and textures, without the use of any real images. Now, we've developed and deployed a new algorithm, one-shot imitation learning, allowing a human to communicate how to do a new task by performing it in VR. Given a single demonstration, the robot is able to solve the same task from an arbitrary starting configuration. Caption: Our system can learn a behavior from a single demonstration delivered within a simulator, then reproduce that behavior in different setups in reality. The system is powered by two neural networks: a vision network and an imitation network. The vision network ingests an image from the robot's camera and outputs state representing the positions of the objects.