Personal
Exploring Practitioner Perspectives On Training Data Attribution Explanations
Nguyen, Elisa, Kortukov, Evgenii, Song, Jean Y., Oh, Seong Joon
Explainable AI (XAI) aims to provide insight into opaque model reasoning to humans and as such is an interdisciplinary field by nature. In this paper, we interviewed 10 practitioners to understand the possible usability of training data attribution (TDA) explanations and to explore the design space of such an approach. We confirmed that training data quality is often the most important factor for high model performance in practice and model developers mainly rely on their own experience to curate data. End-users expect explanations to enhance their interaction with the model and do not necessarily prioritise but are open to training data as a means of explanation. Within our participants, we found that TDA explanations are not well-known and therefore not used. We urge the community to focus on the utility of TDA techniques from the human-machine collaboration perspective and broaden the TDA evaluation to reflect common use cases in practice.
As OpenAI chaos mounts, talks to bring back Sam Altman continue
Altman's sudden move to join Microsoft is not finalized, Satya Nadella, CEO of Microsoft, signaled in an interview with CNBC on Monday. A person familiar with the matter said he would only return to OpenAI if the board members who ousted him stepped down. In the CNBC interview on Monday afternoon, Nadella sought to assure customers and investors that his company was on solid ground no matter the outcome. He left the door open for Altman to return to OpenAI or continue on as an AI leader at Microsoft, even though he announced late Sunday night that Altman was coming to Microsoft. "I'm open to both options," Nadella said in the interview with CNBC.
Christopher Nolan on the Promise and Peril of Technology
By the time I sat down with Christopher Nolan in his posh hotel suite not far from the White House, I guessed that he was tired of Washington, D.C. The day before, he'd toured the Oval Office and had lunch on Capitol Hill. Later that night, I'd watched him receive an award from the Federation for American Scientists, an organization that counts Robert Oppenheimer, the subject of Nolan's most recent film, among its founders. He'd endured a joke, repeated too many times by Senate Majority Leader Chuck Schumer, about the subject of his next film--"It's another biopic: Schumer." The award was sitting on an end table next to Nolan, who was dressed in brown slacks, a gray vest, and a navy suit jacket--his Anglo-formality undimmed by decades spent living in Los Angeles. "It's heavy, and glass, and good for self-defense," he said of the award, while filling his teacup.
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Rein, David, Hou, Betty Li, Stickland, Asa Cooper, Petty, Jackson, Pang, Richard Yuanzhe, Dirani, Julien, Michael, Julian, Bowman, Samuel R.
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.
System 2 Attention (is something you might need too)
Weston, Jason, Sukhbaatar, Sainbayar
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information: QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
Lost in the Middle: How Language Models Use Long Contexts
Liu, Nelson F., Lin, Kevin, Hewitt, John, Paranjape, Ashwin, Bevilacqua, Michele, Petroni, Fabio, Liang, Percy
While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
Interview with Dautzenberg Roman: #IROS2023 Best Paper Award on Mobile Manipulation sponsored by OMRON Sinic X Corp.
Congratulations to Dautzenberg Roman and his team of researchers, who won the IROS 2023 Best Paper Award on Mobile Manipulation sponsored by OMRON Sinic X Corp. for their paper "A perching and tilting aerial robot for precise and versatile power tool work on vertical walls". Below, the authors tell us more about their work, the methodology, and what they are planning next. Our paper shows a an aerial robot (think "drone") which can exert large forces in the horizontal direction, i.e. onto walls. This is a difficult task, as UAVs usually rely on thrust vectoring to apply horizontal forces and thus can only apply small forces before losing control authority. By perching onto walls, our system no longer needs the propulsion to remain at a desired site.
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Qin, Chengwei, Zhang, Aston, Zhang, Zhuosheng, Chen, Jiaao, Yasunaga, Michihiro, Yang, Diyi
Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.
Who Is Mira Murati, OpenAI's New Interim CEO?
Until the dramatic departure of OpenAI's cofounder and CEO Sam Altman Friday, Mira Murati was its chief technology officer--but you could also call her as its minister of truth. In addition to heading the teams that develop tools such as ChatGPT and Dall-E, it's been her job to make sure those products don't mislead people, show bias, or snuff out humanity altogether. This interview was conducted in July 2023 for WIRED's cover story on OpenAI. It is being published today after Sam Altman's sudden departure to provide a glimpse at the thinking of the powerful AI company's new boss. Steven Levy: How did you come to join OpenAI?
OpenAI CEO Sam Altman ousted as 'board no longer has confidence' in his leadership
In a surprise shakeup of its c-suite Friday, OpenAI's board of directors announced that CEO Sam Altman has been fired and will be leaving both the company and the board, effective immediately. Chief Technology Officer Mira Murati has been named interim CEO. Altman's oustering reportedly follows an internal "deliberative review process" which found he had not been "consistently candid in his communications with the board, hindering its ability to exercise its responsibilities," the company announced. As such, "the board no longer has confidence in his ability to continue leading OpenAI." OpenAI, which owns popular AI chatbot ChatGPT, thanked Altman' for his "many contributions to the founding and growth of OpenAI," but believes that "as the leader of the company's research, product, and safety functions, Mira is exceptionally qualified to step into the role of interim CEO." The board added it has "the utmost confidence in her ability to lead OpenAI during this transition period."