Large Language Model
AI should be licensed like medicines or nuclear power, Labour suggests
The UK should bar technology developers from working on advanced artificial intelligence tools unless they have a licence to do so, Labour has said. Ministers should introduce much stricter rules around companies training their AI products on vast datasets of the kind used by OpenAI to build ChatGPT, Lucy Powell, Labour's digital spokesperson, told the Guardian. Her comments come amid a rethink at the top of government over how to regulate the fast-moving world of AI, with the prime minister, Rishi Sunak, acknowledging it could pose an "existential" threat to humanity. One of the government's advisers on artificial intelligence also said on Monday that humanity could have only two years before AI is able to outwit people, the latest in a series of stark warnings about the threat posed by the fast-developing technology. Powell said: "My real point of concern is the lack of any regulation of the large language models that can then be applied across a range of AI tools, whether that's governing how they are built, how they are managed or how they are controlled."
Daiwa gives workers OK to "freely use" ChatGPT as part of tech drive
Daiwa Securities Group employees are widely using an artificial intelligence-powered chatbot in Japan as the nation's second-largest brokerage follows global banks in exploring the potential of rapidly evolving technologies. Chief Executive Officer Seiji Nakata said the Tokyo-based firm started an experiment in April that gave around 9,000 workers in Japan the go-ahead to "freely use" ChatGPT. Daiwa has also been strengthening the recruitment of science graduates to develop high-tech experts in house, he said in an interview. The move comes as an AI revolution unfolds on Wall Street in response to widening interest in the technology and its likely business impact. Deutsche Bank is using it to scan wealthy client portfolios, while JPMorgan Chase & Co. is advertising for more AI roles than any of its rivals.
Chegg Embraced AI. ChatGPT Ate Its Lunch Anyway
Investors were surprised when the online education company Chegg last month revealed that ChatGPT was hurting subscriber growth--the company lost half of its market value overnight. But long before Chegg became an index case for the disruptive force of ChatGPT, its top brass had heard plenty of warnings about the threat and opportunity of generative AI. For years, on afternoon walks outside Chegg's Silicon Valley headquarters, former executives say they had discussed someday slashing costs by tapping AI programs to replace an army of instructors that answer student questions and draft flashcards. Matthew Ramirez, a product leader who left Chegg two years ago, says he even advised CEO Dan Rosensweig in 2020 that generative AI would be the bus that ran down Chegg if it didn't prepare itself. And just weeks after OpenAI launched ChatGPT last November, a source familiar with the exchange says, one Chegg executive had the bot write an email to Rosensweig urging him to develop a ChatGPT rival.
ChatGPT has a problem no one wants to talk about
That sort of computational power requires GPUs, or graphics processing units, that were first made for video games but were found to be the only chips that could handle such heavy computer tasks as large language models. Currently, just one company, Nvidia, sells the best of those, for which it charges tens of thousands of dollars. Nvidia's valuation recently rocketed to $1 trillion on the anticipated sales. The Taiwan-based company that manufactures many of those chips, TSMC, has likewise soared in value.
The Creator of ChatGPT on the Rise of Artificial Intelligence
Sign up to receive our weekly newsletter of the best New Yorker podcasts. David Remnick sits down with Sam Altman, the C.E.O. of OpenAI, which created ChatGPT, GPT-4, and other artificial-intelligence programs. A.I. is a tool, Altman emphasizes, that streamlines human work and quickens the pace of scientific advancement. But he claims to empathize with concerns about the emerging technology. "Even if you don't believe in any of the sci-fi stories," he tells Remnick, "you could still be freaked out about the level of change that this is going to bring society and the compressed time frame in which that's going to happen."
Information Flow Control in Machine Learning through Modular Model Architecture
Tiwari, Trishita, Gururangan, Suchin, Guo, Chuan, Hua, Weizhe, Kariyappa, Sanjay, Gupta, Udit, Xiong, Wenjie, Maeng, Kiwan, Lee, Hsien-Hsin S., Suh, G. Edward
In today's machine learning (ML) models, any part of the training data can affect its output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access controlled data, we propose the notion of information flow control for machine learning, and develop a secure Transformer-based language model based on the Mixture-of-Experts (MoE) architecture. The secure MoE architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enabling a subset of experts at inference time based on an access control policy. The evaluation using a large corpus of text data shows that the proposed MoE architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (up to 37%) by enabling training on access-controlled data.
DISCO: Distilling Counterfactuals with Large Language Models
Chen, Zeming, Gao, Qiyue, Bosselut, Antoine, Sabharwal, Ashish, Richardson, Kyle
Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend to new counterfactual dimensions. In this work, we introduce DISCO (DIStilled COunterfactual Data), a new method for automatically generating high quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters these generations to distill high-quality counterfactual data. While task-agnostic, we apply our pipeline to the task of natural language inference (NLI) and find that on challenging evaluations such as the NLI stress test, comparatively smaller student models trained with DISCO generated counterfactuals are more robust (6% absolute) and generalize better across distributions (2%) compared to models trained without data augmentation. Furthermore, DISCO augmented models are 10% more consistent between counterfactual pairs on three evaluation sets, demonstrating that DISCO augmentation enables models to more reliably learn causal representations. Our repository is available at: https://github.com/eric11eca/disco
Zero-Shot Prompting for Implicit Intent Prediction and Recommendation with Commonsense Reasoning
Intelligent virtual assistants are currently designed to perform tasks or services explicitly mentioned by users, so multiple related domains or tasks need to be performed one by one through a long conversation with many explicit intents. Instead, human assistants are capable of reasoning (multiple) implicit intents based on user utterances via commonsense knowledge, reducing complex interactions and improving practicality. Therefore, this paper proposes a framework of multi-domain dialogue systems, which can automatically infer implicit intents based on user utterances and then perform zero-shot prompting using a large pre-trained language model to trigger suitable single task-oriented bots. The proposed framework is demonstrated effective to realize implicit intents and recommend associated bots in a zero-shot manner.
A Universal Discriminator for Zero-Shot Generalization
Xu, Haike, Lin, Zongyu, Zhou, Jing, Zheng, Yanan, Yang, Zhilin
Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization. In this work, we challenge this convention by showing that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks. Technically, we train a single discriminator to predict whether a text sample comes from the true data distribution, similar to GANs. Since many NLP tasks can be formulated as selecting from a few options, we use this discriminator to predict the concatenation of input and which option has the highest probability of coming from the true data distribution. This simple formulation achieves state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by 16.0\%, 7.8\%, and 11.5\% respectively on different scales. In the finetuning setting, our approach also achieves new state-of-the-art results on a wide range of NLP tasks, with only 1/4 parameters of previous methods. Meanwhile, our approach requires minimal prompting efforts, which largely improves robustness and is essential for real-world applications. Furthermore, we also jointly train a generalized UD in combination with generative tasks, which maintains its advantage on discriminative tasks and simultaneously works on generative tasks.
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Xiao, Guangxuan, Lin, Ji, Seznec, Mickael, Wu, Hao, Demouth, Julien, Han, Song
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the matrix multiplications in LLMs, including OPT, BLOOM, GLM, MT-NLG, and LLaMA family. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. SmoothQuant enables serving 530B LLM within a single node. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs. Code is available at https://github.com/mit-han-lab/smoothquant.