reference task
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Louisiana (0.04)
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
Experiments of pretraining 410M and 1B models on the C4 dataset demonstrate that MA TES significantly outperforms random data selection on extensive downstream tasks. It doubles the gains achieved by the state-of-the-art data selection approach that leverages larger reference models and reduces the total FLOPs required to reach certain performances by half. Further analyses validate the effectiveness of the locally probed oracle data influence and the approximation with data influence models. Our code is open-sourced at https://github.com/cxcscmu/MA
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan (0.04)
- Europe (0.04)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
Synthesizing Tasks for Block-based Programming
Block-based visual programming environments play a critical role in introducing computing concepts to K-12 students. One of the key pedagogical challenges in these environments is in designing new practice tasks for a student that match a desired level of difficulty and exercise specific programming concepts.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Louisiana (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan (0.04)
- Europe (0.04)
- Asia > Singapore (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Singapore (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging
Zhao, Yiran, Zhang, Wenxuan, Wang, Huiming, Kawaguchi, Kenji, Bing, Lidong
As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.
- Asia > Singapore (0.04)
- North America > Dominican Republic (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)