Large Language Model
Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models
Albalak, Alon, Shrivastava, Akshat, Sankar, Chinnadhurai, Sagar, Adithya, Ross, Mike
Multi-task learning (MTL), instruction tuning, and prompting have recently been shown to improve the generalizability of large language models to new tasks. However, the benefits of such methods are less well-documented in smaller language models, with some studies finding contradictory results. In this work, we explore and isolate the effects of (i) model size, (ii) general purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot fine-tuning for models with fewer than 500 million parameters. Our experiments in the zero-shot setting demonstrate that models gain 31% relative improvement, on average, from general purpose MTL, with an additional 37.6% relative gain from in-domain MTL. Contradictory to prior works on large models, we find that instruction tuning provides a modest 2% performance improvement for small models.
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
Kwon, Se Jung, Kim, Jeonghoon, Bae, Jeongin, Yoo, Kang Min, Kim, Jin-Hwa, Park, Baeseong, Kim, Byeongwook, Ha, Jung-Woo, Sung, Nako, Lee, Dongsoo
There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorizes the full-precision parameters into binary parameters and a separate set of scaling factors. During the adaptation phase, the binary values are frozen for all tasks, while the scaling factors are fine-tuned for the downstream task. We demonstrate that AlphaTuning, when applied to GPT-2 and OPT, performs competitively with full fine-tuning on a variety of downstream tasks while achieving >10x compression ratio under 4-bit quantization and >1,000x reduction in the number of trainable parameters.
Zero-shot stance detection based on cross-domain feature enhancement by contrastive learning
Zhao, Xuechen, Zou, Jiaying, Zhang, Zhong, Xie, Feng, Zhou, Bin, Tian, Lei
Zero-shot stance detection is challenging because it requires detecting the stance of previously unseen targets in the inference phase. The ability to learn transferable target-invariant features is critical for zero-shot stance detection. In this work, we propose a stance detection approach that can efficiently adapt to unseen targets, the core of which is to capture target-invariant syntactic expression patterns as transferable knowledge. Specifically, we first augment the data by masking the topic words of sentences, and then feed the augmented data to an unsupervised contrastive learning module to capture transferable features. Then, to fit a specific target, we encode the raw texts as target-specific features. Finally, we adopt an attention mechanism, which combines syntactic expression patterns with target-specific features to obtain enhanced features for predicting previously unseen targets. Experiments demonstrate that our model outperforms competitive baselines on four benchmark datasets.
Automatic Chain of Thought Prompting in Large Language Models
Zhang, Zhuosheng, Zhang, Aston, Li, Mu, Smola, Alex
Large language models (LLMs) can perform complex reasoning by generating intermediate reasoning steps. Providing these steps for prompting demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has two major paradigms. One leverages a simple prompt like "Let's think step by step" to facilitate step-by-step thinking before answering a question. The other uses a few manual demonstrations one by one, each composed of a question and a reasoning chain that leads to an answer. The superior performance of the second paradigm hinges on the hand-crafting of task-specific demonstrations one by one. We show that such manual efforts may be eliminated by leveraging LLMs with the "Let's think step by step" prompt to generate reasoning chains for demonstrations one by one, i.e., let's think not just step by step, but also one by one. However, these generated chains often come with mistakes. To mitigate the effect of such mistakes, we find that diversity matters for automatically constructing demonstrations. We propose an automatic CoT prompting method: Auto-CoT. It samples questions with diversity and generates reasoning chains to construct demonstrations. On ten public benchmark reasoning tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of the CoT paradigm that requires manual designs of demonstrations. Code is available at https://github.com/amazon-research/auto-cot
See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction
Attarian, Maria, Gupta, Advaya, Zhou, Ziyi, Yu, Wei, Gilitschenski, Igor, Garg, Animesh
Cognitive planning is the structural decomposition of complex tasks into a sequence of future behaviors. In the computational setting, performing cognitive planning entails grounding plans and concepts in one or more modalities in order to leverage them for low level control. Since real-world tasks are often described in natural language, we devise a cognitive planning algorithm via language-guided video prediction. Current video prediction models do not support conditioning on natural language instructions. Therefore, we propose a new video prediction architecture which leverages the power of pre-trained transformers.The network is endowed with the ability to ground concepts based on natural language input with generalization to unseen objects. We demonstrate the effectiveness of this approach on a new simulation dataset, where each task is defined by a high-level action described in natural language. Our experiments compare our method again stone video generation baseline without planning or action grounding and showcase significant improvements. Our ablation studies highlight an improved generalization to unseen objects that natural language embeddings offer to concept grounding ability, as well as the importance of planning towards visual "imagination" of a task.
AI system from DeepMind creates better versions of mathematical algorithms - Technology Org
The research team from DeepMind has recently published a paper where they introduce a new AI platform for improving mathematical algorithms. The new system is known under the name of AlphaTensor, which is an extension of AlphaZero to the discipline of mathematics. This tool takes classic fundamental algorithms as input and produces their improved versions. The team used a matrix multiplication algorithm as a basis for their study, where they analyzed a 50-year-old problem of finding the fastest way to multiply two matrices. Matrix multiplication may seem a very boring part of mathematics requiring lots of time spent multiplying and adding numbers.
The Download: TikTok moral panics, and DeepMind's record-breaking AI
Despite what you may have heard, teens are not stealing their family's fine dinnerware, tossing it in a blender, and snorting the resulting dust for the "porcelain challenge." That's just what Sebastian Durfee, a 23-year-old actor and TikTok creator, hoped you might believe when he spread the word on social media of the latest dangerous teen challenge. Never mind that it was all fake from the start. Last week, Durfee posted a call to action to his followers: to work together to get "boomers to freak out about a fake TikTok challenge." His account was banned just a few days later, but his goal wasn't just to rack up views. It was also to examine how attention and outrage work online, and, in a new twist, to trick the very people who were in on the joke in the first place.
DeepMind unveils first AI to discover faster matrix multiplication algorithms
Learn how your company can create applications to automate tasks and generate further efficiencies through low-code/no-code tools on November 9 at the virtual Low-Code/No-Code Summit. Can artificial intelligence (AI) create its own algorithms to speed up matrix multiplication, one of machine learning's most fundamental tasks? Today, in a paper published in Nature, DeepMind unveiled AlphaTensor, the "first artificial intelligence system for discovering novel, efficient and provably correct algorithms." The Google-owned lab said the research "sheds light" on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices. Ever since the Strassen algorithm was published in 1969, computer science has been on a quest to surpass its speed of multiplying two matrices.
Comprehensive Guide to Zero-Shot and K-Shot Learning
Deep neural networks have achieved state-of-the-art for many computer vision tasks. However, much of this performance improvement can be accredited to their utilisation and reliance on large amounts of supervised information for learning. There are many practical cases in which such training data is not available. Few-shot learning as an approach is tasked with dealing with such issues. Few-shot learning is a type of supervised learning that is intended to rapidly generalise to new tasks containing only a few samples of supervised information based on prior knowledge.
Language Models are Multilingual Chain-of-Thought Reasoners
Shi, Freda, Suzgun, Mirac, Freitag, Markus, Wang, Xuezhi, Srivats, Suraj, Vosoughi, Soroush, Chung, Hyung Won, Tay, Yi, Ruder, Sebastian, Zhou, Denny, Das, Dipanjan, Wei, Jason
We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing model scale, and that models have strikingly strong multilingual reasoning abilities, even in underrepresented languages such as Bengali and Swahili. Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment. The MGSM benchmark is publicly available at https://github.com/google-research/url-nlp.