Goto

Collaborating Authors

 Large Language Model


Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

arXiv.org Artificial Intelligence

Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transformer models. Angel-PTM can train extremely large-scale models with hierarchical memory efficiently. The key designs of Angel-PTM are the fine-grained memory management via the Page abstraction and a unified scheduling method that coordinate the computations, data movements, and communications. Furthermore, Angel-PTM supports extreme model scaling with SSD storage and implements the lock-free updating mechanism to address the SSD I/O bandwidth bottlenecks. Experimental results demonstrate that Angel-PTM outperforms existing systems by up to 114.8% in terms of maximum model scale as well as up to 88.9% in terms of training throughput. Additionally, experiments on GPT3-175B and T5-MoE-1.2T models utilizing hundreds of GPUs verify the strong scalability of Angel-PTM.


WADER at SemEval-2023 Task 9: A Weak-labelling framework for Data augmentation in tExt Regression Tasks

arXiv.org Artificial Intelligence

Intimacy is an essential element of human relationships and language is a crucial means of conveying it. Textual intimacy analysis can reveal social norms in different contexts and serve as a benchmark for testing computational models' ability to understand social information. In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. WADER uses data augmentation to address the problems of data imbalance and data scarcity and provides a method for data augmentation in cross-lingual, zero-shot tasks. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data and optimally select augmentation candidates. Our results show that WADER outperforms the baseline model and provides a direction for mitigating data imbalance and scarcity in text regression tasks.


Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

arXiv.org Artificial Intelligence

Our goal is to improve reliability of Machine Learning (ML) systems deployed in the wild. ML models perform exceedingly well when test examples are similar to train examples. However, real-world applications are required to perform on any distribution of test examples. Current ML systems can fail silently on test examples with distribution shifts. In order to improve reliability of ML models due to covariate or domain shift, we propose algorithms that enable models to: (a) generalize to a larger family of test distributions, (b) evaluate accuracy under distribution shifts, (c) adapt to a target distribution. We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models. A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses. While we improve robustness over standard training methods for certain problem settings, performance of ML systems can still vary drastically with domain shifts. It is crucial for developers and stakeholders to understand model vulnerabilities and operational ranges of input, which could be assessed on the fly during the deployment, albeit at a great cost. Instead, we advocate for proactively estimating accuracy surfaces over any combination of prespecified and interpretable domain shifts for performance forecasting. We present a label-efficient estimation to address estimation over a combinatorial space of domain shifts. Further, when a model's performance on a target domain is found to be poor, traditional approaches adapt the model using the target domain's resources. Standard adaptation methods assume access to sufficient labeled resources, which may be impractical for deployed models. We initiate a study of lightweight adaptation techniques with only unlabeled data resources with a focus on language applications.


Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

arXiv.org Artificial Intelligence

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters. Finetuning pretrained language models (PLMs) has led to significant improvements across various downstream NLP tasks (Devlin et al., 2019; Howard & Ruder, 2018; Raffel et al., 2020). However, the conventional paradigm of full task-specific finetuning (FT) is difficult to scale to multiple tasks, given that modern PLMs can have hundreds of millions (or even billions) of parameters. There thus has been a growing interest in developing parameter-efficient methods for model tuning (Houlsby et al., 2019; Lester et al., 2021; Ding et al., 2022), where the goal is to learn only a small number of additional parameters per task while achieving performance comparable to full finetuning. Work done during an internship at MIT-IBM Watson AI Lab. Figure 2: Parameter efficiency on GLUE (left) and SuperGLUE (right). Our multitask prompt tuning (MPT) approach, which transfers a single shared prompt learned from multiple source tasks using prompt decomposition and distillation, maintains high accuracy (y-axis) while finetuning only a small number of parameters per task (x-axis).


DeepStruct: Pretraining of Language Models for Structure Prediction

arXiv.org Artificial Intelligence

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We study the performance of this approach on 28 datasets, spanning 10 structure prediction tasks including open information extraction, joint entity and relation extraction, named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, factual probe, intent detection, and dialogue state tracking. We further enhance the pretraining with the task-specific training sets. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that we evaluate.


Low Emission Building Control with Zero-Shot Reinforcement Learning

arXiv.org Artificial Intelligence

Heating and cooling systems in buildings account for 31% of global energy use, much of which are regulated by Rule Based Controllers (RBCs) that neither maximise energy efficiency nor minimise emissions by interacting optimally with the grid. Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency, but existing solutions require access to building-specific simulators or data that cannot be expected for every building in the world. In response, we show it is possible to obtain emission-reducing policies without such knowledge a priori--a paradigm we call zero-shot building control. We combine ideas from system identification and model-based RL to create PEARL (Probabilistic Emission-Abating Reinforcement Learning) and show that a short period of active exploration is all that is required to build a performant model. In experiments across three varied building energy simulations, we show PEARL outperforms an existing RBC once, and popular RL baselines in all cases, reducing building emissions by as much as 31% whilst maintaining thermal comfort. Our source code is available online via https://enjeeneer.io/projects/pearl/


AI apps such as ChatGPT could play a role in Whitehall, says science secretary

#artificialintelligence

Artificial intelligence systems such as ChatGPT could play a role in Whitehall and represent a "massive opportunity", the new science secretary has suggested. Michelle Donelan, who took over the new role after the prime minister's departmental reshuffle last month, said the civil service should rely on its own experts but did not rule out a role for artificial intelligence in the future. ChatGPT can generate articles, essays, jokes, poetry and job applications in response to text prompts. OpenAI, a private company backed by Microsoft, made it available to the public for free in November. It can respond to questions in a human-like manner and understand the context of follow-up queries much like in human conversations, as well as being able to compose longform pieces of writing if asked.


'AI-powered' search is off to a problematic start. Can Google and Bing fix it?

#artificialintelligence

The era of AI-generated conversational search is, apparently, here. On 16th December I published a piece about whether ChatGPT could pose a threat to Google, as many were already suggesting that it might, just two and a half weeks on from the chatbot's release. At the time of writing, neither Google nor Microsoft – a major backer of ChatGPT's parent organisation, OpenAI – had indicated any plans to actually integrate technology like ChatGPT into their search engines, and the idea seemed like a far-off possibility. While ChatGPT is an impressive conversational chatbot, it has some significant drawbacks, particularly as an arbiter of facts and information: large language models (LLMs) like ChatGPT have a tendency to "hallucinate" (the technical term) and confidently state wrong information, a task that ChatGPT's makers have called "challenging" to fix. But the idea of a chat-based search interface has its appeal.


Analyzing OpenAI's investment strategy

#artificialintelligence

By investing in the next generation of AI startups, OpenAI could cement its status as the go-to AI model developer -- and chart a course toward profitability. We break down where it's investing and the areas it could target next.


ChatSonic Launches Its ChatGPT-like Google Chrome Extension

#artificialintelligence

On Saturday, ChatSonic, an initiative of WriteSonic, announced it had launched its Google Chrome extension for Gmail, Twitter, LinkedIn, and the web. Hailed as the best alternative to ChatGPT due to its real-time data, ChatSonic will help users be more efficient and productive on Gmail, LinkedIn, Twitter, etc. ChatGPT is built on a data set limited to September 21, 2022, claims the startup. "In today's busy world, while many people are creating tools to boost productivity, sometimes, we have used a common-sense approach and launched a Google Chrome extension to help users handle their email content, create tweets, LinkedIn posts, and create unique content anywhere on Chrome without having to leave the page," shared Samayou Garg, founder, WriteSonic. Garg shares that ChatSonic has generated over 6 million curated content and has grown 10x in the last two months. For Twitter, it can help you with creative-unique tweets and relevant hashtags, suggest which accounts to engage with, and summarise long Twitter threads.