Goto

Collaborating Authors

 sentence


When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Zhang, Biao, Liu, Zhongtao, Cherry, Colin, Firat, Orhan

arXiv.org Artificial Intelligence

While large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications, our understanding on the inductive biases (especially the scaling properties) of different finetuning methods is still limited. To fill this gap, we conduct systematic experiments studying whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance. We consider two types of finetuning - full-model tuning (FMT) and parameter efficient tuning (PET, including prompt tuning and LoRA), and explore their scaling behaviors in the data-limited regime where the LLM model size substantially outweighs the finetuning data size. Based on two sets of pretrained bilingual LLMs from 1B to 16B and experiments on bilingual machine translation and multilingual summarization benchmarks, we find that 1) LLM finetuning follows a powerbased multiplicative joint scaling law between finetuning data size and each other scaling factor; 2) LLM finetuning benefits more from LLM model scaling than pretraining data scaling, and PET parameter scaling is generally ineffective; and 3) the optimal finetuning method is highly task-and finetuning data-dependent. We hope our findings could shed light on understanding, selecting and developing LLM finetuning methods. Advanced LLMs, such as GPT-4 (OpenAI, 2023) and PaLM 2 (Anil et al., 2023), often show emergent capabilities and allow for in-context learning that could use just a few demonstration examples to perform complex reasoning and generation tasks (Wei et al., 2022; Zhang et al., 2023; Fu et al., 2023; Shen et al., 2023). Still, LLM finetuning is required and widely adopted to unlock new and robust capabilities for creative tasks, get the most for focused downstream tasks, and align its value with human preferences (Ouyang et al., 2022; Yang et al., 2023; Gong et al., 2023; Schick et al., 2023). This becomes more significant in traditional industrial applications due to the existence of large-scale annotated task-specific data accumulated over years.


How you can train AI to convert design mockups into HTML and CSS

#artificialintelligence

Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now. In this post, we'll teach a neural network how to code a b...


Turning Design Mockups Into Code With Deep Learning - FloydHub Blog

#artificialintelligence

Within three years deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software. The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code. Currently, the largest barrier to au...


Turning Design Mockups Into Code With Deep Learning - FloydHub Blog

#artificialintelligence

Within three years deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software. The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code. Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now.


For The First Time, AI Can Teach Itself Any Language On Earth

#artificialintelligence

To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the world–in order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google's system uses human-supervised neural networks that compare parallel texts–books and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don't really capture the original meaning but, in general, they are functional and, overtime, they're getting better and better.


The 1996 Simon Newcomb Award

AI Magazine

His proofs are ingenious, cleverly argued, quite convincing to many of his contemporaries, and utterly wrong. The Simon Newcomb Award is given annually for the silliest published argument attacking AI. Our subject may be unique in the virulence and frequency with which it is attacked, both in the popular media and among the cultured intelligentsia. Recent articles have argued that the very idea of AI reflects a cancer in the heart of our culture and have proven (yet again) that it is impossible. While many of these attacks are cited widely, most of them are ridiculous to anyone with an appropriate technical education.


Learning Language Using a Pattern Recognition Approach

AI Magazine

IBM Palo Alto Scientific Center, 2530 Page Mill Road, Palo Alto, CA 94303 Abstract A pattern recognition algorithm is described that learns a transition net grammar from positive examples. Two sets of examples-one in English and one in Chinese-are presented. It is hoped that language learning will reduce the knowledge acquisition effort for expert systems and make the natural language interface to database systems more transportable. The algorithm presented makes a step in that direction by providing a robust parser and reducing special interaction for introduction of new words and terms. We are developing a natural language interface to an expert system for message processing.


Knowledge Interchange Format: The KIF of Death

AI Magazine

There has been a flurry of interest recently in the possibility of standardizing existing work on knowledge representation; this interest is supported by the Defense Advanced Research Projects Agency (DARPA) and other funding agencies. An examination of recent work on knowledge representation makes it clear that there are deep differences among the approaches taken. Those supporting knowledge representation standards are attempting to address this difficulty by creating a single language in which all knowledge representation schemes can be expressed (Genesereth 1990), but this task seems impossible given the current state of the field. However, it is surely not possible to construct a language that will also incorporate all future knowledge representation work, other than in the trivial sense guaranteed by the universality of some specific method, such as first-order logic or a general-purpose programming language. Furthermore, attempts in this direction will inevitably constrain future knowledge representation efforts; even gentle constraints might have a stifling impact on future knowledge representation work.


The Process Specification Language (PSL)

AI Magazine

However, interoperability among these manufacturing applications is hindered because the applications use different terminology and representations of the domain. These problems arise most acutely for systems that must manage the heterogeneity inherent in various domains and integrate models of different domains into coherent frameworks (figure 1). For example, such integration occurs in businessprocess reengineering, where enterprise models integrate processes, organizations, goals, and customers. Even when applications use the same terminology, they often associate different semantics with the terms. This clash over the meaning of the terms prevents the seamless exchange of information among the applications.


Statistical Techniques for Natural Language Parsing

AI Magazine

I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic--statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions. In this example, I adopt the standard abbreviations: s for sentence, np for noun phrase, vp for verb phrase, and det for determiner. It is generally accepted that finding the sort of structure shown in figure 1 is useful in determining the meaning of a sentence.