Goto

Collaborating Authors

 Lewkowycz, Aitor


Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

arXiv.org Artificial Intelligence

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.


Language Model Cascades

arXiv.org Artificial Intelligence

Prompted models have demonstrated impressive In this position paper, we argue that a useful unifying few-shot learning abilities. Repeated interactions framework for understanding and extending this disparate at test-time with a single model, or the body of work is in terms of probabilistic programming languages composition of multiple models together, further (PPL) extended to work with strings, instead of expands capabilities. These compositions are more atomic data types like integers and floats. That is, probabilistic models, and may be expressed in we use a PPL to define a joint probability model on stringvalued the language of graphical models with random random variables, parameterized using LMs, and variables whose values are complex data types then condition this model on string-valued observations in such as strings. Cases with control flow and dynamic order to compute a posterior over string-valued unknowns, structure require techniques from probabilistic which we can then infer. We call such a probabilistic programming, which allow implementing program a language model cascade. We show that this disparate model structures and inference strategies framework captures many recent approaches, and also allows in a unified language. We formalize several us to tackle more complex multi-step reasoning problems.


On the training dynamics of deep networks with $L_2$ regularization

arXiv.org Machine Learning

These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks.