DEM: Distribution Edited Model for Training with Mixed Data Distributions

Ram, Dhananjay, Rawal, Aditya, Hardalov, Momchil, Pappas, Nikolaos, Zha, Sheng

Jun-21-2024–arXiv.org Artificial Intelligence

Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs. In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations. The resulting model, namely Distribution Edited Model (DEM), is 11x cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding up to 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, and 9.3% on HELM with models of size 3B to 13B. Notably, DEM does not require full re-training when modifying a single data-source, thus making it very flexible and scalable for training with diverse data sources.

data distribution, dataset, distribution vector, (15 more...)

arXiv.org Artificial Intelligence

Jun-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
  - Maryland (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Information Management (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Natural Language > Large Language Model (0.47)
    - Representation & Reasoning > Optimization (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found