Feng, Fu
Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
Feng, Fu, Xie, Yucheng, Yang, Xu, Wang, Jing, Geng, Xin
Given the challenge atively generated using
WAVE: Weight Template for Adaptive Initialization of Variable-sized Models
Feng, Fu, Xie, Yucheng, Wang, Jing, Geng, Xin
The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking perspective and introduce \textbf{WAVE}, which incorporates a set of shared \textbf{W}eight templates for \textbf{A}daptive initialization of \textbf{V}ariable-siz\textbf{E}d Models. During initialization, target models will initialize the corresponding weight scalers tailored to their model size, which are sufficient to learn the connection rules of weight templates based on the Kronecker product from a limited amount of data. For the construction of the weight templates, WAVE utilizes the \textit{Learngene} framework, which structurally condenses common knowledge from ancestry models into weight templates as the learngenes through knowledge distillation. This process allows the integration of pre-trained models' knowledge into structured knowledge according to the rules of weight templates. We provide a comprehensive benchmark for the learngenes, and extensive experiments demonstrate the efficacy of WAVE. The results show that WAVE achieves state-of-the-art performance when initializing models with various depth and width, and even outperforms the direct pre-training of $n$ entire models, particularly for smaller models, saving approximately $n\times$ and $5\times$ in computational and storage resources, respectively. WAVE simultaneously achieves the most efficient knowledge transfer across a series of datasets, specifically achieving an average improvement of 1.8\% and 1.2\% on 7 downstream datasets.
Transferring Core Knowledge via Learngenes
Feng, Fu, Wang, Jing, Geng, Xin
The pre-training paradigm fine-tunes the models trained on large-scale datasets to downstream tasks with enhanced performance. It transfers all knowledge to downstream tasks without discriminating which part is necessary or unnecessary, which may lead to negative transfer. In comparison, knowledge transfer in nature is much more efficient. When passing genetic information to descendants, ancestors encode only the essential knowledge into genes, which act as the medium. Inspired by that, we adopt a recent concept called ``learngene'' and refine its structures by mimicking the structures of natural genes. We propose the Genetic Transfer Learning (GTL) -- a framework to copy the evolutionary process of organisms into neural networks. GTL trains a population of networks, selects superior learngenes by tournaments, performs learngene mutations, and passes the learngenes to next generations. Finally, we successfully extract the learngenes of VGG11 and ResNet12. We show that the learngenes bring the descendant networks instincts and strong learning ability: with 20% parameters, the learngenes bring 12% and 16% improvements of accuracy on CIFAR-FS and miniImageNet. Besides, the learngenes have the scalability and adaptability on the downstream structure of networks and datasets. Overall, we offer a novel insight that transferring core knowledge via learngenes may be sufficient and efficient for neural networks.
Genes in Intelligent Agents
Feng, Fu, Wang, Jing, Yang, Xu, Geng, Xin
The genes in nature give the lives on earth the current biological intelligence through transmission and accumulation over billions of years. Inspired by the biological intelligence, artificial intelligence (AI) has devoted to building the machine intelligence. Although it has achieved thriving successes, the machine intelligence still lags far behind the biological intelligence. The reason may lie in that animals are born with some intelligence encoded in their genes, but machines lack such intelligence and learn from scratch. Inspired by the genes of animals, we define the ``genes'' of machines named as the ``learngenes'' and propose the Genetic Reinforcement Learning (GRL). GRL is a computational framework that simulates the evolution of organisms in reinforcement learning (RL) and leverages the learngenes to learn and evolve the intelligence agents. Leveraging GRL, we first show that the learngenes take the form of the fragments of the agents' neural networks and can be inherited across generations. Second, we validate that the learngenes can transfer ancestral experience to the agents and bring them instincts and strong learning abilities. Third, we justify the Lamarckian inheritance of the intelligent agents and the continuous evolution of the learngenes. Overall, the learngenes have taken the machine intelligence one more step toward the biological intelligence.