Large Language Model
OpenAI debuts ChatGPT and GPT-3.5 series as GPT-4 rumors fly
Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. As GPT-4 rumors fly around NeurIPS 2022 this week in New Orleans (including whispers that details about GPT-4 will be revealed there), OpenAI has managed to make plenty of news in the meantime. On Monday, the company announced a new model in the GPT-3 family of AI-powered large language models, text-davinci-003, part of what it calls the "GPT-3.5 series," that reportedly improves on its predecessors by handling more complex instructions and producing higher-quality, longer-form content. Unlike davinci-002, which uses supervised fine-tuning on human-written demonstrations and highly scored model samples to improve generation quality, davinci-003 is a true reinforcement learning with human feedback (RLHF) model." Meanwhile, today OpenAI launched an early demo of ChatGPT, another part of the GPT-3.5 series that is an interactive, conversational model whose dialogue format "makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests."
While everyone waits for GPT-4, OpenAI is still fixing its predecessor
ChatGPT appears to address some of these problems, but it is far from a full fix--as I found when I got to try it out. This suggests that GPT-4 won't be either. In particular, ChatGPT--like Galactica, Meta's large language model for science, which the company took offline earlier this month after just three days--still makes stuff up. There's a lot more to do, says John Shulman, a scientist at OpenAI: "We've made some progress on that problem, but it's far from solved." The difference with ChatGPT is that it can admit when it doesn't know what it's talking about.
Best NLP Papers -- October 2022
This roundup highlights some interesting NLP papers from October 2022 around language model capabilities. This article's title and TL;DR have been generated with Cohere. Get started with text generation. NLP is evolving at a rapid pace, and every month we discover new capabilities. Large language models, like those built by Cohere, are being used for use cases that we couldn't have imagined even just a few months ago.
AI Is Terrible at Detecting Misinformation. It Doesn't Have to Be. - Nautilus
Elon Musk has said he wants to make Twitter "the most accurate source of information in the world." I am not convinced that he means it, but whether he does or not, he's going to have to work on the problem; a lot of advertisers have already made that pretty clear. If he does nothing, they are out. And Musk has continued to tweet in ways that seem to indicate that he is generally on board with some kind of content moderation. The tech journalist Kara Swisher has speculated that Musk wants AI to help; on Twitter she wrote, rather plausibly, that Musk "is hoping to build an AI system that replaces [fired moderators] that will not work well now but will presumably get better."
Researchers Win Gordon Bell Special Prize for Models that Track COVID Variants
Members of the GenSLMs team received the Gordon Bell Special Prize for HPC-Based COVID-19 Research at the SC22 conference. Scientists from Argonne National Laboratory and a team of collaborators have won the 2022 ACM Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research for their method of quickly identifying how a virus evolves. Their work in training large language models (LLMs) to discover variants of SARS-CoV-2 has implications to biology beyond COVID-19. The researchers leveraged Argonne's supercomputing and AI resources to develop and apply LLMs toward tracking how a virus can mutate into more dangerous or more transmissible variants, or a variant of concern (VOC). Existing methods to track VOCs can be slow.
Google's Code-as-Policies Lets Robots Write Their Own Code
Researchers from Google's Robotics team have open-sourced Code-as-Policies (CaP), a robot control method that uses a large language model (LLM) to generate robot-control code that achieves a user-specified goal. CaP uses a hierarchical prompting technique for code generation that outperforms previous methods on the HumanEval code-generation benchmark. The technique and experiments were described in a paper published on arXiv. CaP differs from previous attempts to use LLMs to control robots; instead of generating a sequence of high-level steps or policies to be invoked by the robot, CaP directly generates Python code for those policies. The Google team developed a set of prompting techniques that improved code-generation, including a new hierarchical prompting method.
Effective Altruism Is Pushing a Dangerous Brand of 'AI Safety'
Throughout my two decades in Silicon Valley, I have seen effective altruism (EA)--a movement consisting of an overwhelmingly white male group based largely out of Oxford University and Silicon Valley--gain alarming levels of influence. EA is currently being scrutinized due to its association with Sam Bankman-Fried's crypto scandal, but less has been written about how the ideology is now driving the research agenda in the field of artificial intelligence (AI), creating a race to proliferate harmful systems, ironically in the name of "AI safety." EA is defined by the Center for Effective Altruism as "an intellectual project, using evidence and reason to figure out how to benefit others as much as possible." And "evidence and reason" have led many EAs to conclude that the most pressing problem in the world is preventing an apocalypse where an artificially generally intelligent being (AGI) created by humans exterminates us. To prevent this apocalypse, EA's career advice center, 80,000 hours, lists "AI safety technical research" and "shaping future governance of AI" as the top two recommended careers for EAs to go into, and the billionaire EA class funds initiatives attempting to stop an AGI apocalypse.
OpenAI Turns to Davinci to Make GPT-3 Better
OpenAI API adds'text-davinci-003' to its list of main GPT-3 models, which can do all tasks other models can do while also ensuring high quality, longer output, and better instruction-following. Davinci is the most competent and can perform all tasks the other models can, often with fewer instructions. It works specifically well with tasks requiring in-depth knowledge of the subject matter, such as summarising texts for a specific audience and creative content development. However, the new capabilities of Davinci also require more computing resources leading to higher costs per API call and lesser speed than other models. For example, it is good at deducing solutions to various logical problems and outlining character motivations.