Large Language Model
A Guide to Machine Learning PhDs
A machine learning learning PhD doesn't only open up some of the highest-paying jobs around, it sets you up to have an outsized positive impact on the world. This comprehensive guide on machine learning PhDs from 80,000 Hours (YC S15) will help you get started. The guide is based on discussion with six machine learning researchers including two at DeepMind, one at OpenAI, and one running a robotics start-up. Check out the highlights below. Machine learning involves giving software rules to learn from experience rather than directly programming the steps it takes.
'Sonic the Hedgehog' is Teaching AI How to Learn
Researchers at OpenAI have already proven AI can get really good at video games. Now they are teaching AI how to learn games quickly, like a human would. That's why they've challenged developers to submit their own code for an AI-only Sonic the Hedgehog competition. For more videos, subscribe to Mashable Daily: http://on.mash.to/SubscribeNews Give us a follow: Facebook: https://www.facebook.com/mashable/
AI Safety via Debate
We're proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins. We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences. We're going to outline this method together with preliminary proof-of-concept experiments and are also releasing a web interface so people can experiment with the technique. The debate method visualized as a game tree, similar to a game like Go but with sentences between debaters for moves and human judgements at leaf nodes. In both debate and Go, the true answer depends on the entire tree, but a single path through the tree chosen by strong agents is evidence for the whole.
How can we be sure AI will behave? Perhaps by watching it argue with itself.
Someday, it might be perfectly normal to watch an AI system fight with itself. The concept comes from researchers at OpenAI, a nonprofit founded by several Silicon Valley luminaries, including Y Combinator partner Sam Altman, LinkedIn chair Reid Hoffman, Facebook board member and Palantir founder Peter Thiel, and Tesla and SpaceX head Elon Musk. The OpenAI researchers have previously shown that AI systems that train themselves can sometimes develop unexpected and unwanted habits. For example, in a computer game, an agent may figure out how to "glitch" its way to a higher score. In some cases it may be possible for a person to supervise the training process.
A new form of "Master algorithm" could pave the way for super intelligent machines
You can be excused for not noticing that a scientist named Daniel Buehrer, a retired professor from the National Chung Cheng University in Taiwan, recently published a white paper proposing a new class of mathematics that many feel could one day lead to the birth of machine "consciousness," and perhaps even Artificial Super Intelligence (ASI) itself which is slated to arrive circa 2045. After all, keeping up with all the breakthroughs in the field of Artificial Intelligence (AI), from the development of new Artificial General Intelligence (AGI) architectures to the AI's, for example, from DeepMind, that are self-evolving and fighting each other, can be exhausting. Robot consciousness, or sentient machines, have long been a touchy subject if for no other reason than the fact that as of yet we still aren't able to describe what consciousness really is, let alone how it came to be, and this therefore makes it a touchy subject for anyone in AI circles. In order to have a discussion around the idea of a computer that can'feel' and'think,' and that has its own aspirations and motivations, you first have to find two people who actually agree on the semantics of sentience. And if you manage that, you'll then have to wade through a myriad of hypothetical objections to any theoretical living AI you can come up with.
Facebook's Go-playing AI is a free download
Much as IBM's Watson once demonstrated the power of AI by becoming a Jeopardy champion, DeepMind's AlphaGo has been beating the world's best Go players, a long-time aspiration of AI researchers which once seemed unobtainable. Here at the F8 conference, Facebook CTO Mike Schroepfer lavished praise on DeepMind (a division of Google) for its accomplishment--and then began talking about ELF OpenGo, Facebook's own reimplementation of DeepMind's technology. Though he readily admitted that Facebook's version isn't the world's best Go-playing technology, it recently took on four top-30 human Go players--running on a computer with a single GPU powering its computations--and won 14-0.
Facebook's open-source Go bot can now beat professional players
Go is the go-to game for machine learning researchers. It's what Google's DeepMind team famously used to show off its algorithms, and Facebook, too, recently announced that it was building a Go bot of its own. As the team announced at the company's F8 developer conference today, the ELF OpenGo bot has now achieved professional status after winning all 14 games it played against a group of top 30 human Go players recently. "We salute our friends at DeepMind for doing awesome work," Facebook CTO Mike Schroepfer said in today's keynote. "But we wondered: Are there some unanswered questions? What else can you apply these tools to."
DeepMind papers at NIPS 2017 DeepMind
Learning in models with discrete latent variables is challenging due to high-variance gradient estimators. Previous approaches either produced high-variance, unbiased gradients or low-variance, biased gradients. REBAR uses control variates and the reparameterization trick to get the best of both: low-variance, unbiased gradients that result in faster convergence to a better result. "We describe a new family of approaches for imagination-based planning...We also introduce architectures which provide new ways for agents to learn and construct plans to maximise the efficiency of a task. These architectures are efficient, robust to complex and imperfect models, and can adopt flexible strategies for exploiting their imagination. The agents we introduce benefit from an'imagination encoder'- a neural network which learns to extract any information useful for the agent's future decisions, but ignore that which is not relevant."
Zero-Shot Visual Imitation
The current dominant paradigm of imitation learning relies on strong supervision of expert actions for learning both what to and how to imitate. We propose an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its own experience into a goal-conditioned skill policy using a novel forward consistency loss formulation. In our framework, the role of the human expert is only to communicate goals (i.e., what to imitate) during inference. The learned policy is then employed to mimic the expert (i.e., how to imitate) after observing just a visual demonstration. Our method is "zero-shot" in the sense that the agent never has access to expert actions either during training or for task demonstration at inference.
Zero-Shot Visual Imitation
Pathak, Deepak, Mahmoudieh, Parsa, Luo, Guanghao, Agrawal, Pulkit, Chen, Dian, Shentu, Yide, Shelhamer, Evan, Malik, Jitendra, Efros, Alexei A., Darrell, Trevor
The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both what and how to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert is only to communicate the goals (i.e., what to imitate) during inference. The learned policy is then employed to mimic the expert (i.e., how to imitate) after seeing just a sequence of images demonstrating the desired task. Our method is "zero-shot" in the sense that the agent never has access to expert actions during training or for the task demonstration at inference. We evaluate our zero-shot imitator in two real-world settings: complex rope manipulation with a Baxter robot and navigation in previously unseen office environments with a TurtleBot. Through further experiments in VizDoom simulation, we provide evidence that better mechanisms for exploration lead to learning a more capable policy which in turn improves end task performance. Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations. The current dominant paradigm in learning from demonstration (LfD) (Ar-gall et al., 2009; Ng & Russell, 2000; Pomerleau, 1989; Schaal, 1999) requires the expert to either manually move the robot joints (i.e., kinesthetic teaching) or teleoperate the robot to execute the desired task. The expert typically provides multiple demonstrations of a task at training time, and this generates data in the form of observation-action pairs from the agent's point of view. Such a heavily supervised approach, where it is necessary to provide demonstrations by controlling the robot, is incredibly tedious for the human expert. Moreover, for every new task that the robot needs to execute, the expert is required to provide a new set of demonstrations. Instead of communicating how to perform a task via observation-action pairs, a more general formulation allows the expert to communicate onlywhat needs to be done by providing the observations of the desired world states via a video or a sparse sequence of images. This way, the agent is required to infer how to perform the task (i.e., actions) by itself.