Goto

Collaborating Authors

 Large Language Model


Chatbots Sound Like They're Posting on LinkedIn

The Atlantic - Technology

If you spend any time on the internet, you're likely now familiar with the gray-and-teal screenshots of AI-generated text. At first they were meant to illustrate ChatGPT's surprising competence at generating human-sounding prose, and then to demonstrate the occasionally unsettling answers that emerged once the general public could bombard it with prompts. OpenAI, the organization that is developing the tool, describes one of its biggest problems this way: "ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers." In layman's terms, the chatbot makes stuff up. As similar services, such as Google's Bard, have rushed their tools into public testing, their screenshots have demonstrated the same capacity for fabricating people, historical events, research citations, and more, and for rendering those falsehoods in the same confident, tidy prose.


NVIDIA made an open source tool for creating safer and more secure AI models

Engadget

Since March, NVIDIA has offered AI Foundations, a service that allows businesses to train large language models (LLMs) on their own proprietary data. Today the company is introducing NeMo Guardrails, a tool designed to help developers ensure their generative AI apps are accurate, appropriate and safe. NeMo Guardrails allows software engineers to enforce three different kinds of limits on their in-house LLMs. Specifically, firms can set "topical guardrails" that will prevent their apps from addressing subjects they weren't trained to tackle. For instance, NVIDIA suggests a customer service chatbot would, with the help of its software, decline to answer a question about the weather.


The 'Don't Look Up' Thinking That Could Doom Us With AI

TIME - Tech

Many companies are working to build AGI (artificial general intelligence), defined as "AI that can learn and perform most intellectual tasks that human beings can, including AI development." Below we'll discuss why this may rapidly lead to superintelligence, defined as "general intelligence far beyond human level". I'm often told that AGI and superintelligence won't happen because it's impossible: human-level Intelligence is something mysterious that can only exist in brains. Such carbon chauvinism ignores a core insight from the AI revolution: that intelligence is all about information processing, and it doesn't matter whether the information is processed by carbon atoms in brains or by silicon atoms in computers. AI has been relentlessly overtaking humans on task after task, and I invite carbon chauvinists to stop moving the goal posts and publicly predict which tasks AI will never be able to do.


TABLET: Learning From Instructions For Tabular Data

arXiv.org Artificial Intelligence

Acquiring high-quality data is often a significant challenge in training machine learning (ML) models for tabular prediction, particularly in privacy-sensitive and costly domains like medicine and finance. Providing natural language instructions to large language models (LLMs) offers an alternative solution. However, it is unclear how effectively instructions leverage the knowledge in LLMs for solving tabular prediction problems. To address this gap, we introduce TABLET, a benchmark of 20 diverse tabular datasets annotated with instructions that vary in their phrasing, granularity, and technicality. Additionally, TABLET includes the instructions' logic and structured modifications to the instructions. We find in-context instructions increase zero-shot F1 performance for Flan-T5 11b by 44% on average and 13% for ChatGPT on TABLET. Also, we explore the limitations of using LLMs for tabular prediction in our benchmark by evaluating instruction faithfulness. We find LLMs often ignore instructions and fail to predict specific instances correctly, even with examples. Our analysis on TABLET shows that, while instructions help LLM performance, learning from instructions for tabular data requires new capabilities.


Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

arXiv.org Artificial Intelligence

Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. Our approach, Language Guided Bottlenecks (LaBo), leverages a language model, GPT-3, to define a large space of possible bottlenecks. Given a problem domain, LaBo uses GPT-3 to produce factual sentences about categories to form candidate concepts. LaBo efficiently searches possible bottlenecks through a novel submodular utility that promotes the selection of discriminative and diverse information. Ultimately, GPT-3's sentential concepts can be aligned to images using CLIP, to form a bottleneck layer. Experiments demonstrate that LaBo is a highly effective prior for concepts important to visual recognition. In the evaluation with 11 diverse datasets, LaBo bottlenecks excel at few-shot classification: they are 11.7% more accurate than black box linear probes at 1 shot and comparable with more data. Overall, LaBo demonstrates that inherently interpretable models can be widely applied at similar, or better, performance than black box approaches.


On the Computation of Meaning, Language Models and Incomprehensible Horrors

arXiv.org Artificial Intelligence

We integrate foundational theories of meaning with a mathematical formalism of artificial general intelligence (AGI) to offer a comprehensive mechanistic explanation of meaning, communication, and symbol emergence. This synthesis holds significance for both AGI and broader debates concerning the nature of language, as it unifies pragmatics, logical truth conditional semantics, Peircean semiotics, and a computable model of enactive cognition, addressing phenomena that have traditionally evaded mechanistic explanation. By examining the conditions under which a machine can generate meaningful utterances or comprehend human meaning, we establish that the current generation of language models do not possess the same understanding of meaning as humans nor intend any meaning that we might attribute to their responses. To address this, we propose simulating human feelings and optimising models to construct weak representations. Our findings shed light on the relationship between meaning and intelligence, and how we can build machines that comprehend and intend meaning.


AI-assisted coding: Experiments with GPT-4

arXiv.org Artificial Intelligence

Recent developments in artificial intelligence, particularly through large language models, have enabled the automated generation of computer code (Chen et al. 2021; Bubeck et al. 2023). In particular, GPT-4 has enabled human-level performance on a set of coding challenges that are outside of the training set of the model (Bubeck et al. 2023). In addition, automated coding assistants (particularly Github Copilot) have become integrated into commmon devlopment environments and are widely used, with some evidence that they can signficantly improve coding productivity. The performance of these models is also raising important questions regarding coding education, given that the current models can easily complete most coding problem sets using in introductory programming courses (Finnie-Ansley et al. 2022). In the present paper we explore some of the implications of AI-assisted coding using GPT-4, in a more qualitative way than previous benchmarking assessments. First we examine the experience of interactive coding and debugging using the ChatGPT interface to GPT-4 on a set of data science coding problems.


AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXiv.org Artificial Intelligence

Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Despite the recent success, current LLMs are not capable of processing complex audio information or conducting spoken conversations (like Siri or Alexa). In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i.e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue. With an increasing demand to evaluate multi-modal LLMs of human intention understanding and cooperation with foundation models, we outline the principles and processes and test AudioGPT in terms of consistency, capability, and robustness. Experimental results demonstrate the capabilities of AudioGPT in solving AI tasks with speech, music, sound, and talking head understanding and generation in multi-round dialogues, which empower humans to create rich and diverse audio content with unprecedented ease.


A Theory on Adam Instability in Large-Scale Machine Learning

arXiv.org Artificial Intelligence

Training instability reported by Chowdhery et al. [2022] is an interesting phenomenon that has only been reported for the large language models trained on an order of a trillion tokens, posing a threat to further scaling of the AI systems. Chowdhery et al. [2022] have observed dozens of spikes in the loss curve throughout training. To mitigate the issue, they re-started training from a checkpoint roughly 100 steps before the spike started, and skipped roughly 200-500 data batches, in order to exclude batches that were seen right before and during the spike. In that case, the spike of the loss value did not repeat. The spikes were also not observed when the skipped data was fed through the model again after the aforementioned mitigation, which implies that the data itself did not cause the spike, but rather an interference of the data batch with the state of the model training run. The purpose of this work is to rigorously reproduce the experiment with a different hardware and software setup, come up with an explanation for the observed behavior supported by empirical evidence and theoretical arguments, and propose alternative ways of mitigating the issue. Loss spikes are difficult to study because any reproduction of these spikes at a smaller scale is not necessarily caused by or remediated by the same factors as in larger scales. We therefore analyze large-scale language modeling experiments, training four models between 7 billion and 546 billion parameters. The models are decoder-only transformers [Brown et al., 2020, Smith et al., 2022] with different depth and embedding dimensions and trained using the AdamW [Loshchilov and Hutter, 2017] algorithm with a linear learning rate schedule.


StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

arXiv.org Artificial Intelligence

Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.