Goto

Collaborating Authors

 Large Language Model


WinoDict: Probing language models for in-context word acquisition

arXiv.org Artificial Intelligence

We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary definition of the new word given in the prompt. This benchmark addresses word acquisition, one important aspect of the diachronic degradation known to afflict LLMs. As LLMs are frozen in time at the moment they are trained, they are normally unable to reflect the way language changes over time. We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark, thus identifying a limitation of current models and providing a benchmark to measure future improvements in LLMs ability to do in-context learning.


The secret to Sparrow, DeepMind's latest chatbot: Humans

#artificialintelligence

DeepMind has trained a chatbot named Sparrow to be less toxic and more accurate than other systems, by using a mix of human feedback and Google search suggestions. Chatbots are typically powered by large language models (LLMs) trained on text scraped from the internet. These models are capable of generating paragraphs of prose that are, at a surface level at least, coherent and grammatically correct, and can respond to questions or written prompts from users. This software, however, often picks up bad traits from the source material resulting in it regurgitating offensive, racist, and sexist views, or spewing fake news or conspiracies that are often found on social media and internet forums. That said, these bots can be guided to generate safer output.


Can Transformer Models Effectively Detect Software Aspects in StackOverflow Discussion?

arXiv.org Artificial Intelligence

Dozens of new tools and technologies are being incorporated to help developers, which is becoming a source of consternation as they struggle to choose one over the others. For example, there are at least ten frameworks available to developers for developing web applications, posing a conundrum in selecting the best one that meets their needs. As a result, developers are continuously searching for all of the benefits and drawbacks of each API, framework, tool, and so on. One of the typical approaches is to examine all of the features through official documentation and discussion. This approach is time-consuming, often makes it difficult to determine which aspects are the most important to a particular developer and whether a particular aspect is important to the community at large. In this paper, we have used a benchmark API aspects dataset (Opiner) collected from StackOverflow posts and observed how Transformer models (BERT, RoBERTa, DistilBERT, and XLNet) perform in detecting software aspects in textual developer discussion with respect to the baseline Support Vector Machine (SVM) model. Through extensive experimentation, we have found that transformer models improve the performance of baseline SVM for most of the aspects, i.e., `Performance', `Security', `Usability', `Documentation', `Bug', `Legal', `OnlySentiment', and `Others'. However, the models fail to apprehend some of the aspects (e.g., `Community' and `Potability') and their performance varies depending on the aspects. Also, larger architectures like XLNet are ineffective in interpreting software aspects compared to smaller architectures like DistilBERT.


NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries

arXiv.org Artificial Intelligence

We develop NL2INTERFACE to explore the potential of generating usable interactive multi-visualization interfaces from natural language queries. With NL2INTERFACE, users can directly write natural language queries to automatically generate a fully interactive multi-visualization interface without any extra effort of learning a tool or programming language. Further, users can interact with the interfaces to easily transform the data and quickly see the results in the visualizations.


AI model from OpenAI automatically recognizes speech and translates it to English

#artificialintelligence

On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more. OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in 98 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.


Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

#artificialintelligence

Posted by Douglas Yarrington (Google TPgM), James Rubin (Google PM), Neal Vaidya (NVIDIA TME), Jay Rodge (NVIDIA PMM)Together, NVIDIA and Google are delighted to announce new milestones and plans to optimize TensorFlow and JAX for the Ampere and recently announced Hopper GPU architectures by leveraging the power of XLA: a performant, flexible and extensible ML compiler built by Google.


AlphaFold developers win US$3-million Breakthrough Prize

#artificialintelligence

Demis Hassabis (left) and John Jumper (right) from DeepMind developed AlphaFold, an AI that can predict the structure of proteins.Credit: Breakthrough Prize The researchers behind the AlphaFold artificial-intelligence (AI) system have won one of this year's US$3-million Breakthrough prizes -- the most lucrative awards in science. Demis Hassabis and John Jumper, both at DeepMind in London, were recognized for creating the tool that has predicted the 3D structures of almost every known protein on the planet. "Few discoveries so dramatically alter a field, so rapidly," says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. "It's really changed the practice of structural biology, both computational and experimental." The award was one of five Breakthrough prizes -- awarded for achievements in life sciences, physics and mathematics -- announced on 22 September.


The Download: YouTube's deadly crafts, and DeepMind's new chatbot

MIT Technology Review

Ann Reardon is probably the last person whose content you'd expect to be banned from YouTube. A former Australian youth worker and a mother of three, she's been teaching millions of loyal subscribers how to bake since 2011. But the removal email was referring to a video that was not Reardon's typical sugar-paste fare. Since 2018, Reardon has used her platform to warn viewers about dangerous new "craft hacks" that are sweeping YouTube, tackling unsafe activities such as poaching eggs in a microwave, bleaching strawberries, and using a Coke can and a flame to pop popcorn. The most serious is "fractal wood burning", which involves shooting a high-voltage electrical current across dampened wood to burn a twisting, turning branch-like pattern in its surface. The practice has killed at least 33 people since 2016.


Promptagator: Few-shot Dense Retrieval From 8 Examples

arXiv.org Artificial Intelligence

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 (Santhanam et al., 2022) by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given. Recently, major progress has been made on neural retrieval models such as dual encoders, which can retrieve knowledge from a large collection of documents containing millions to billions of passages (Yih et al., 2011; Lee et al., 2019; Karpukhin et al., 2020). However, Thakur et al. (2021) recently proposed the BEIR heterogeneous retrieval benchmark, and showed that it is still difficult for neural retrievers to perform well on a wide variety of retrieval tasks that lack dedicated training data. Thus, previous approaches focus on transferring knowledge from question answering (QA) datasets such as MS MARCO (Nguyen et al., 2016). To best transfer from QA datasets, expressive retrievers are developed that allow fine-grained token-level interaction such as ColBERT (Khattab & Zaharia, 2020; Santhanam et al., 2022) and SPLADE (Formal et al., 2021) but with higher inference cost.


Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

arXiv.org Artificial Intelligence

Revision is an essential part of the human writing process. It tends to be strategic, adaptive, and, more importantly, iterative in nature. Despite the success of large language models on text revision tasks, they are limited to non-iterative, one-shot revisions. Examining and evaluating the capability of large language models for making continuous revisions and collaborating with human writers is a critical step towards building effective writing assistants. In this work, we present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3), which aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions. In R3, a text revision model provides text editing suggestions for human writers, who can accept or reject the suggested edits. The accepted edits are then incorporated into the model for the next iteration of document revision. Writers can therefore revise documents iteratively by interacting with the system and simply accepting/rejecting its suggested edits until the text revision model stops making further revisions or reaches a predefined maximum number of revisions. Empirical experiments show that R3 can generate revisions with comparable acceptance rate to human writers at early revision depths, and the human-machine interaction can get higher quality revisions with fewer iterations and edits. The collected human-model interaction dataset and system code are available at \url{https://github.com/vipulraheja/IteraTeR}. Our system demonstration is available at \url{https://youtu.be/lK08tIpEoaE}.