ancestry model
Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers
In recent years, the merging of vast datasets with powerful computational resources has led to the emergence of large pre-trained models in the field of deep learning. To mitigate this issue, we propose \textbf{Cluster-Learngene}, which effectively clusters critical internal modules from a large ancestry model and then inherits them to initialize descendant models of elastic scales. Specifically, based on the density characteristics of attention heads, our method adaptively clusters attention heads of each layer and position-wise feed-forward networks (FFNs) in the ancestry model as the learngene. Moreover, we introduce priority weight-sharing and learnable parameter transformations that expand the learngene to initialize descendant models of elastic scales. Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources.
Building Variable-sized Models via Learngene Pool
Shi, Boyu, Xia, Shiyu, Yang, Xu, Chen, Haokun, Kou, Zhiqiang, Geng, Xin
Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net.
Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models
Wang, Qiufeng, Yang, Xu, Lin, Shuxia, Wang, Jing, Geng, Xin
During the continuous evolution of one organism's ancestry, its genes accumulate extensive experiences and knowledge, enabling newborn descendants to rapidly adapt to their specific environments. Motivated by this observation, we propose a novel machine learning paradigm Learngene to enable learning models to incorporate three key characteristics of genes. (i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model. (ii) Condensing: the extensive accumulated knowledge is condensed into a much more compact information piece, i.e., learngene. (iii) Inheriting: the condensed learngene is inherited to make it easier for descendant models to adapt to new environments. Since accumulating has been studied in well-established paradigms like large-scale pre-training and lifelong learning, we focus on condensing and inheriting, which induces three key issues and we provide the preliminary solutions to these issues in this paper: (i) Learngene Form: the learngene is set to a few integral layers that can preserve significance. (ii) Learngene Condensing: we identify which layers among the ancestry model have the most similarity as one pseudo descendant model. (iii) Learngene Inheriting: to construct distinct descendant models for the specific downstream tasks, we stack some randomly initialized layers to the learngene layers. Extensive experiments across various settings, including using different network architectures like Vision Transformer (ViT) and Convolutional Neural Networks (CNNs) on different datasets, are carried out to confirm four advantages of Learngene: it makes the descendant models 1) converge more quickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and 4) require fewer training samples to converge.