AITopics | learngene

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Neural Information Processing SystemsMar-20-2026, 10:15:57 GMT

In practical scenarios, it is necessary to build variable-sized models to accommodate diverse resource constraints, where weight initialization serves as a crucial step preceding training. The recently introduced Learngene framework firstly learns one compact module, termed learngene, from a large well-trained model, and then transforms learngene to initialize variable-sized models. However, the existing Learngene methods provide limited guidance on transforming learngene, where transformation mechanisms are manually designed and generally lack a learnable component. Moreover, these methods only consider transforming learngene along depth dimension, thus constraining the flexibility of learngene. Motivated by these concerns, we propose a novel and effective Learngene approach termed LeTs (Learnable Transformation), where we transform the learngene module along both width and depth dimension with a set of learnable matrices for flexible variablesized model initialization. Specifically, we construct an auxiliary model comprising the compact learngene module and learnable transformation matrices, enabling both components to be trained. To meet the varying size requirements of target models, we select specific parameters from well-trained transformation matrices to adaptively transform the learngene, guided by strategies such as continuous selection and magnitude-wise selection. Extensive experiments on ImageNet-1K demonstrate that Des-Nets initialized via LeTs outperform those with 100-epoch from scratch training after only 1 epoch tuning. When transferring to downstream image classification tasks, LeTs achieves better results while outperforming from scratch training after about 10 epochs within a 300-epoch training schedule.

artificial intelligence, learngene, machine learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.39)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

4c5e2bcbf21bdf40d75fddad0bd43dc9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 17:22:46 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

3a1fc7b8e200a45110872c56f0569f61-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:37:28 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)
Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(4 more...)

Add feedback

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Neural Information Processing SystemsDec-25-2025, 02:00:50 GMT

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments. Thus, it is necessary to adapt ViT architectures to devices with diverse computational overheads to achieve an accuracy-efficient trade-off. This concept is consistent with the motivation behind Learngene. To achieve this, inspired by polynomial decomposition in calculus, where a function can be approximated by linearly combining several basic components, we propose to linearly decompose the ViT model into a set of components called learngenes during element-wise training. These learngenes can then be recomposed into differently scaled, pre-initialized models to satisfy different computational resource constraints. Such a decomposition-recomposition strategy provides an economical and flexible approach to generating different scales of ViT models for different deployment scenarios. Compared to model compression or training from scratch, which require to repeatedly train on large datasets for diverse-scale models, such strategy reduces computational costs since it only requires to train on large datasets once. Extensive experiments are used to validate the effectiveness of our method: ViTs can be decomposed and the decomposed learngenes can be recomposed into diverse-scale ViTs, which can achieve comparable or better performance compared to traditional model compression and pre-training methods. The code for our experiments is available in the supplemental material.

artificial intelligence, decomposing and recomposing vision transformer, linearly decomposing, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Neural Information Processing SystemsOct-11-2025, 00:21:48 GMT

Moreover, the training and storage costs of such method grow linearly with the number of potential scenarios.

initialization, learngene, transformation matrix, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models Shuxia Lin

Neural Information Processing SystemsOct-9-2025, 23:39:04 GMT

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments.

const, learngene, vit model, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)
Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

2e53c02ea028cbf603f4b6b47fef3d97-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 22:13:51 GMT

attention head, centroid, descendant model, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(4 more...)

Add feedback

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Neural Information Processing SystemsMay-27-2025, 00:39:31 GMT

In practical scenarios, it is necessary to build variable-sized models to accommodate diverse resource constraints, where weight initialization serves as a crucial step preceding training. The recently introduced Learngene framework firstly learns one compact module, termed learngene, from a large well-trained model, and then transforms learngene to initialize variable-sized models. However, the existing Learngene methods provide limited guidance on transforming learngene, where transformation mechanisms are manually designed and generally lack a learnable component. Moreover, these methods only consider transforming learngene along depth dimension, thus constraining the flexibility of learngene. Motivated by these concerns, we propose a novel and effective Learngene approach termed LeTs (Learnable Transformation), where we transform the learngene module along both width and depth dimension with a set of learnable matrices for flexible variablesized model initialization.

artificial intelligence, learnable transformation, learngene, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Neural Information Processing SystemsMay-26-2025, 21:42:57 GMT

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments. Thus, it is necessary to adapt ViT architectures to devices with diverse computational overheads to achieve an accuracy-efficient trade-off. This concept is consistent with the motivation behind Learngene. To achieve this, inspired by polynomial decomposition in calculus, where a function can be approximated by linearly combining several basic components, we propose to linearly decompose the ViT model into a set of components called learngenes during element-wise training. These learngenes can then be recomposed into differently scaled, pre-initialized models to satisfy different computational resource constraints.

artificial intelligence, decomposing and recomposing vision transformer, linearly decomposing, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.64)

Add feedback

Harmonizing Generalization and Personalization in Ring-topology Decentralized Federated Learning

Guo, Shunxin, Lv, Jiaqi, Geng, Xin

arXiv.org Artificial IntelligenceApr-29-2025

We introduce Ring-topology Decentralized Federated Learning (RDFL) for distributed model training, aiming to avoid the inherent risks of centralized failure in server-based FL. However, RDFL faces the challenge of low information-sharing efficiency due to the point-to-point communication manner when handling inherent data heterogeneity. Existing studies to mitigate data heterogeneity focus on personalized optimization of models, ignoring that the lack of shared information constraints can lead to large differences among models, weakening the benefits of collaborative learning. To tackle these challenges, we propose a Divide-and-conquer RDFL framework (DRDFL) that uses a feature generation model to extract personalized information and invariant shared knowledge from the underlying data distribution, ensuring both effective personalization and strong generalization. Specifically, we design a \textit{PersonaNet} module that encourages class-specific feature representations to follow a Gaussian mixture distribution, facilitating the learning of discriminative latent representations tailored to local data distributions. Meanwhile, the \textit{Learngene} module is introduced to encapsulate shared knowledge through an adversarial classifier to align latent representations and extract globally invariant information. Extensive experiments demonstrate that DRDFL outperforms state-of-the-art methods in various data heterogeneity settings.

artificial intelligence, federated learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2504.19103

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

learngene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

4c5e2bcbf21bdf40d75fddad0bd43dc9-Paper-Conference.pdf

3a1fc7b8e200a45110872c56f0569f61-Paper-Conference.pdf

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models Shuxia Lin

2e53c02ea028cbf603f4b6b47fef3d97-Paper-Conference.pdf

Initializing Variable-sized Vision Transformers from Learngene with Learnable Transformation

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Harmonizing Generalization and Personalization in Ring-topology Decentralized Federated Learning