Personal
DOC: Improving Long Story Coherence With Detailed Outline Control
Yang, Kevin, Klein, Dan, Peng, Nanyun, Tian, Yuandong
We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.
AI helps create 'final' Beatles song with John Lennon: Paul McCartney
New York City musician Jules Avalon reflects on the power of John Lennon and the Beatles at Strawberry Fields in Central Park, located across the street from where Lennon was murdered on Dec. 8, 1980. Paul McCartney announced Tuesday that artificial intelligence has been used to help create the "last" ever Beatles song, featuring the voice of John Lennon. In an interview with BBC Radio, McCartney, speaking about AI, said "we were able to use that kind of thing when Peter Jackson did the film'Get Back' where it was us making the Let It Be album." "And he was able to extricate John's voice from a ropey little bit of cassette where it had John's voice and a piano – he could separate them with AI. They tell the machine'that is a voice, this is a guitar, lose the guitar.' And he did that," McCartney continue.
Microsoft's Satya Nadella Is Betting Everything on AI
I never thought I'd write these words, but here goes. Satya Nadella--and Microsoft, the company he runs--are riding high on the buzz from its search engine. That's quite a contrast from the first time I spoke with Nadella, in 2009. Back then, he was not so well known, and he made a point of telling me about his origins. Born in Hyderabad, India, he attended grad school in the US and joined Microsoft in 1992, just as the firm was rising to power.
Assigning AI: Seven Approaches for Students, with Prompts
Mollick, Ethan, Mollick, Lilach
Abstract: This paper examines the transformative role of Large Language Models (LLMs) in education and their potential as learning tools, despite their inherent risks and limitations. The authors propose seven approaches for utilizing AI in classrooms: AI-tutor, AI-coach, AI-mentor, AI-teammate, AI-tool, AIsimulator, and AI-student, each with distinct pedagogical benefits and risks. The aim is to help students learn with and about AI, with practical strategies designed to mitigate risks such as complacency about the AI's output, errors, and biases. These strategies promote active oversight, critical assessment of AI outputs, and complementation of AI's capabilities with the students' unique insights. By challenging students to remain the "human in the loop", the authors aim to enhance learning outcomes while ensuring that AI serves as a supportive tool rather than a replacement. The proposed framework offers a guide for educators navigating the integration of AI-assisted learning in ...
Adding guardrails to advanced chatbots
Generative AI models continue to become more powerful. The launch of ChatGPT in November 2022 has ushered in a new era of AI. ChatGPT and other similar chatbots have a range of capabilities, from answering student homework questions to creating music and art. There are already concerns that humans may be replaced by chatbots for a variety of jobs. Because of the wide spectrum of data chatbots are built on, we know that they will have human errors and human biases built into them. These biases may cause significant harm and/or inequity toward different subpopulations. To understand the strengths and weakness of chatbot responses, we present a position paper that explores different use cases of ChatGPT to determine the types of questions that are answered fairly and the types that still need improvement. We find that ChatGPT is a fair search engine for the tasks we tested; however, it has biases on both text generation and code generation. We find that ChatGPT is very sensitive to changes in the prompt, where small changes lead to different levels of fairness. This suggests that we need to immediately implement "corrections" or mitigation strategies in order to improve fairness of these systems. We suggest different strategies to improve chatbots and also advocate for an impartial review panel that has access to the model parameters to measure the levels of different types of biases and then recommends safeguards that move toward responses that are less discriminatory and more accurate.
HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models
Controlling the text generated by language models and customizing the content has been a long-standing challenge. Existing prompting techniques proposed in pursuit of providing control are task-specific and lack generality; this provides overwhelming choices for non-expert users to find a suitable method for their task. The effort associated with those techniques, such as in writing examples, explanations, instructions, etc. further limits their adoption among non-expert users. In this paper, we propose a simple prompting strategy HELP ME THINK where we encourage GPT3 to help non-expert users by asking a set of relevant questions and leveraging user answers to execute the task. We demonstrate the efficacy of our technique HELP ME THINK on a variety of tasks. Specifically, we focus on tasks that are hard for average humans and require significant thinking to perform. We hope our work will encourage the development of unconventional ways to harness the power of large language models.
AI jobs with mind-blowing paychecks of $375K a year
Harvey Castro talks about how AI cold be used in cold cases and the symbiotic relationship between AI and a detective. There's no question that artificial intelligence is changing our lives. A bot that sounds almost human can author your emails, teach you a new language, book your trips or even be your friend. Check out direct links to try those out here. One woman I spoke with on my national radio show even married her AI companion.
It Was Founded in a Denny's. Now It's Worth More Than Facebook.
Nvidia, the company that dominates the market for graphics processing units, was once known mostly in the video game world. But these days, Nvidia GPUs are also the go-to source for the massive computing power needed to run generative A.I. systems--and the recent explosion in A.I. hype has propelled the company's stock into the stratosphere. Nvidia briefly hit a trillion-dollar valuation, putting itself in league with tech giants like Alphabet and Apple and launching a bit of a frenzy in the markets. Nvidia is looking like the first big stock win of the A.I. era, and investors are salivating. On Sunday's episode of What Next: TBD, I spoke with Don Clark, a freelance reporter who specializes in the chips industry, about how Nvidia rode the A.I. revolution, became the hottest chipmaker in the world, and made the entire A.I. craze suddenly seem very real.
Challenges and Opportunities for the Design of Smart Speakers
Advances in voice technology and voice user interfaces (VUIs) -- such as Alexa, Siri, and Google Home -- have opened up the potential for many new types of interaction. However, despite the potential of these devices reflected by the growing market and body of VUI research, there is a lingering sense that the technology is still underused. In this paper, we conducted a systematic literature review of 35 papers to identify and synthesize 127 VUI design guidelines into five themes. Additionally, we conducted semi-structured interviews with 15 smart speaker users to understand their use and non-use of the technology. From the interviews, we distill four design challenges that contribute the most to non-use. Based on their (non-)use, we identify four opportunity spaces for designers to explore such as focusing on information support while multitasking (cooking, driving, childcare, etc), incorporating users' mental models for smart speakers, and integrating calm design principles.
ChatGPT: Jack of all trades, master of none
Kocoń, Jan, Cichecki, Igor, Kaszyca, Oliwier, Kochanek, Mateusz, Szydło, Dominika, Baran, Joanna, Bielaniewicz, Julita, Gruza, Marcin, Janz, Arkadiusz, Kanclerz, Kamil, Kocoń, Anna, Koptyra, Bartłomiej, Mieleszczenko-Kowszewicz, Wiktoria, Miłkowski, Piotr, Oleksy, Marcin, Piasecki, Maciej, Radliński, Łukasz, Wojtasik, Konrad, Woźniak, Stanisław, Kazienko, Przemysław
OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established.