sentence
Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators
Chang, Jiayi, Gao, Mingqi, Hu, Xinyu, Wan, Xiaojun
Previous research has shown that LLMs have potential in multilingual NLG evaluation tasks. However, existing research has not fully explored the differences in the evaluation capabilities of LLMs across different languages. To this end, this study provides a comprehensive analysis of the multilingual evaluation performance of 10 recent LLMs, spanning high-resource and low-resource languages through correlation analysis, perturbation attacks, and fine-tuning. We found that 1) excluding the reference answer from the prompt and using large-parameter LLM-based evaluators leads to better performance across various languages; 2) most LLM-based evaluators show a higher correlation with human judgments in high-resource languages than in low-resource languages; 3) in the languages where they are most sensitive to such attacks, they also tend to exhibit the highest correlation with human judgments; and 4) fine-tuning with data from a particular language yields a broadly consistent enhancement in the model's evaluation performance across diverse languages. Our findings highlight the imbalance in LLMs'evaluation capabilities across different languages and suggest that low-resource language scenarios deserve more attention.
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.34)
Targeted Distillation for Sentiment Analysis
Zhang, Yice, Xie, Guangyu, Lin, Jingjie, Bao, Jianzhu, Wang, Qianlong, Zeng, Xi, Xu, Ruifeng
This paper presents a compact model that achieves strong sentiment analysis capabilities through targeted distillation from advanced large language models (LLMs). Our methodology decouples the distillation target into two key components: sentiment-related knowledge and task alignment. To transfer these components, we propose a two-stage distillation framework. The first stage, knowledge-driven distillation (\textsc{KnowDist}), transfers sentiment-related knowledge to enhance fundamental sentiment analysis capabilities. The second stage, in-context learning distillation (\textsc{ICLDist}), transfers task-specific prompt-following abilities to optimize task alignment. For evaluation, we introduce \textsc{SentiBench}, a comprehensive sentiment analysis benchmark comprising 3 task categories across 12 datasets. Experiments on this benchmark demonstrate that our model effectively balances model size and performance, showing strong competitiveness compared to existing small-scale LLMs.
- North America > Canada (0.28)
- Asia > China (0.28)
- Asia > Thailand (0.14)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- (2 more...)
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Zhang, Biao, Liu, Zhongtao, Cherry, Colin, Firat, Orhan
While large language models (LLMs) often adopt finetuning to unlock their capabilities for downstream applications, our understanding on the inductive biases (especially the scaling properties) of different finetuning methods is still limited. To fill this gap, we conduct systematic experiments studying whether and how different scaling factors, including LLM model size, pretraining data size, new finetuning parameter size and finetuning data size, affect the finetuning performance. We consider two types of finetuning - full-model tuning (FMT) and parameter efficient tuning (PET, including prompt tuning and LoRA), and explore their scaling behaviors in the data-limited regime where the LLM model size substantially outweighs the finetuning data size. Based on two sets of pretrained bilingual LLMs from 1B to 16B and experiments on bilingual machine translation and multilingual summarization benchmarks, we find that 1) LLM finetuning follows a powerbased multiplicative joint scaling law between finetuning data size and each other scaling factor; 2) LLM finetuning benefits more from LLM model scaling than pretraining data scaling, and PET parameter scaling is generally ineffective; and 3) the optimal finetuning method is highly task-and finetuning data-dependent. We hope our findings could shed light on understanding, selecting and developing LLM finetuning methods. Advanced LLMs, such as GPT-4 (OpenAI, 2023) and PaLM 2 (Anil et al., 2023), often show emergent capabilities and allow for in-context learning that could use just a few demonstration examples to perform complex reasoning and generation tasks (Wei et al., 2022; Zhang et al., 2023; Fu et al., 2023; Shen et al., 2023). Still, LLM finetuning is required and widely adopted to unlock new and robust capabilities for creative tasks, get the most for focused downstream tasks, and align its value with human preferences (Ouyang et al., 2022; Yang et al., 2023; Gong et al., 2023; Schick et al., 2023). This becomes more significant in traditional industrial applications due to the existence of large-scale annotated task-specific data accumulated over years.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (3 more...)
How you can train AI to convert design mockups into HTML and CSS
Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now. In this post, we'll teach a neural network how to code a b...
Turning Design Mockups Into Code With Deep Learning - FloydHub Blog
Within three years deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software. The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code. Currently, the largest barrier to au...
Turning Design Mockups Into Code With Deep Learning - FloydHub Blog
Within three years deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software. The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code. Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now.
For The First Time, AI Can Teach Itself Any Language On Earth
To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the world–in order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google's system uses human-supervised neural networks that compare parallel texts–books and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don't really capture the original meaning but, in general, they are functional and, overtime, they're getting better and better.
The 1996 Simon Newcomb Award
His proofs are ingenious, cleverly argued, quite convincing to many of his contemporaries, and utterly wrong. The Simon Newcomb Award is given annually for the silliest published argument attacking AI. Our subject may be unique in the virulence and frequency with which it is attacked, both in the popular media and among the cultured intelligentsia. Recent articles have argued that the very idea of AI reflects a cancer in the heart of our culture and have proven (yet again) that it is impossible. While many of these attacks are cited widely, most of them are ridiculous to anyone with an appropriate technical education.
Learning Language Using a Pattern Recognition Approach
IBM Palo Alto Scientific Center, 2530 Page Mill Road, Palo Alto, CA 94303 Abstract A pattern recognition algorithm is described that learns a transition net grammar from positive examples. Two sets of examples-one in English and one in Chinese-are presented. It is hoped that language learning will reduce the knowledge acquisition effort for expert systems and make the natural language interface to database systems more transportable. The algorithm presented makes a step in that direction by providing a robust parser and reducing special interaction for introduction of new words and terms. We are developing a natural language interface to an expert system for message processing.
Knowledge Interchange Format: The KIF of Death
There has been a flurry of interest recently in the possibility of standardizing existing work on knowledge representation; this interest is supported by the Defense Advanced Research Projects Agency (DARPA) and other funding agencies. An examination of recent work on knowledge representation makes it clear that there are deep differences among the approaches taken. Those supporting knowledge representation standards are attempting to address this difficulty by creating a single language in which all knowledge representation schemes can be expressed (Genesereth 1990), but this task seems impossible given the current state of the field. However, it is surely not possible to construct a language that will also incorporate all future knowledge representation work, other than in the trivial sense guaranteed by the universality of some specific method, such as first-order logic or a general-purpose programming language. Furthermore, attempts in this direction will inevitably constrain future knowledge representation efforts; even gentle constraints might have a stifling impact on future knowledge representation work.
- Government > Regional Government > North America Government > US Government (0.89)
- Government > Military (0.89)