Collaborating Authors

Human reinforced learning could mean 'more truthful and less toxic' AI


AI has been making huge leaps in terms of scientific research, and companies like Nvidia and Meta are continuing to throw more resources towards the technology. But AI learning can have a pretty huge setback when it adopts the prejudices of those who make it. Like all those chatbots that wind up spewing hate speech thanks to their exposure to the criminally online. According to Golem, the OpenAI might have made some headway on that with its new successor to the GPT-3, the autoregressive language model that uses deep learning in an effort to appear human in text. It wrote this article, if you want an example of how that works.

Lessons Learned on Language Model Safety and Misuse


The deployment of powerful AI systems has enriched our understanding of safety and misuse far more than would have been possible through research alone. Here, we describe our latest thinking in the hope of helping other AI developers address safety and misuse of deployed models. Over the past two years, we've learned a lot about how language models can be used and abused--insights we couldn't have gained without the experience of real-world deployment. In June 2020, we began giving access to developers and researchers to the OpenAI API, an interface for accessing and building applications on top of new AI models developed by OpenAI. Deploying GPT-3, Codex, and other models in a way that reduces risks of harm has posed various technical and policy challenges.

A New AI Trend: Chinchilla (70B) Greatly Outperforms GPT-3 (175B) and Gopher (280B)


DeepMind's latest paper dismantles the tired trend of building larger and larger models to improve performance. The company has found a key aspect of scaling large language models that no one has ever applied before. OpenAI, Google, Microsoft, Nvidia, Facebook, and even DeepMind themselves, all big tech companies committed to creating powerful language models, are doing it wrong: Making models larger is neither the best nor the most efficient approach. Increasing model size as a proxy for increasing performance was established in 2020 by Kaplan and others at OpenAI. They found a power law between those variables and concluded that, as more budget is available to train models, the majority should be allocated to making them bigger.

Open AI gets GPT-3 to work by hiring an army of humans to fix GPT's bad answers. Interesting questions involving the mix of humans and computer algorithms in Open AI's GPT-3 program


The InstructGPT research did recruit 40 contracters to generate a dataset that GPT-3 was then fine-tuned on. But I [Quach] don't think those contractors are employed on an ongoing process to edit responses generated by the model. A spokesperson from the company just confirmed to me: "OpenAI does not hire copywriters to edit generated answers," so I don't think the claims are correct." So the above post was misleading. I'd originally titled it, "Open AI gets GPT-3 to work by hiring an army of humans to fix GPT's bad answers." I changed it to "Interesting questions involving the mix of humans and computer algorithms in Open AI's GPT-3 program." I appreciate all the helpful comments! Stochastic algorithms are hard to understand, especially when they include tuning parameters. I'd still like to know whassup with Google's LaMDA chatbot (see item 2 in this post).