Goto

Collaborating Authors

 seed model


Supplementary Materials

Neural Information Processing Systems

Finally, the data was subsampled by a factor of 2. Data augmentation TX features were augmented by adding two types of artificial noise. Each session day has its own affine transform layer. RNN training hyperparameters The hyperparameters for RNN training are listed in Table 1. Table 1: RNN training hyperparameters Description Hyperparameter Learning rate 0.01 Batch size 48 Number of training batches 20000 Number of hidden units in the GRU 512 Number of GRU layers 2 Dropout rate in the GRU 0.4 Optimizer Adam Learning rate decay schedule Linear L2 weight regularization 1e-5 Maximum gradient norm for clipping 10 1.2 Language model training details Out-of-vocabulary words were mapped to a special token. In our case, T contains all 26 English letters, 5 punctuation marks, and the CTC blank symbol.





BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Neural Information Processing Systems

The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive.





Competition and Attraction Improve Model Fusion

Abrantes, João, Lange, Robert Tjarko, Tang, Yujin

arXiv.org Artificial Intelligence

Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model. However, existing methods require manually partitioning model parameters into fixed groups for merging, which restricts the exploration of potential combinations and limits performance. To overcome these limitations, we propose Model Merging of Natural Niches (M2N2), an evolutionary algorithm with three key features: (1) dynamic adjustment of merging boundaries to progressively explore a broader range of parameter combinations; (2) a diversity preservation mechanism inspired by the competition for resources in nature, to maintain a population of diverse, high-performing models that are particularly well-suited for merging; and (3) a heuristicbased attraction metric to identify the most promising pairs of models for fusion. Our experimental results demonstrate, for the first time, that model merging can be used to evolve models entirely from scratch. Specifically, we apply M2N2 to evolve MNIST classifiers from scratch and achieve performance comparable to CMA-ES, while being computationally more efficient. Furthermore, M2N2 scales to merge specialized language and image generation models, achieving state-of-the-art performance. Notably, it preserves crucial model capabilities beyond those explicitly optimized by the fitness function, highlighting its robustness and versatility. Our code is available at https://github.com/SakanaAI/natural_niches


OntoTune: Ontology-Driven Self-training for Aligning Large Language Models

Liu, Zhiqiang, Gan, Chengtao, Wang, Junjie, Zhang, Yichi, Bo, Zhongpu, Sun, Mengshu, Chen, Huajun, Zhang, Wen

arXiv.org Artificial Intelligence

Existing domain-specific Large Language Models (LLMs) are typically developed by fine-tuning general-purposed LLMs with large-scale domain-specific corpora. However, training on large-scale corpora often fails to effectively organize domain knowledge of LLMs, leading to fragmented understanding. Inspired by how humans connect concepts and organize knowledge through mind maps, we aim to emulate this approach by using ontology with hierarchical conceptual knowledge to reorganize LLM's domain knowledge. From this perspective, we propose an ontology-driven self-training framework called OntoTune, which aims to align LLMs with ontology through in-context learning, enabling the generation of responses guided by the ontology. We leverage in-context learning to identify whether the LLM has acquired the specific concept's ontology knowledge, and select the entries not yet mastered by LLM as the training set to further align the LLM with ontology. Compared to existing domain LLMs based on newly collected large-scale domain-specific corpora, our OntoTune, which relies on the existing, long-term developed ontology and LLM itself, significantly reduces data maintenance costs and offers improved generalization ability. We conduct our study in the medical domain to evaluate the effectiveness of OntoTune, utilizing a standardized medical ontology, SNOMED CT as our ontology source. Experimental results demonstrate that OntoTune achieves state-of-the-art performance in both in-ontology task hypernym discovery and out-of-ontology task medical domain QA. Moreover, compared to the latest direct ontology injection method TaxoLLaMA, our OntoTune better preserves original knowledge of LLM. The code and data are available at https://github.com/zjukg/OntoTune.