AITopics | initialization strategy

Structured Initialization for Vision Transformers

Neural Information Processing SystemsJun-22-2026, 14:19:51 GMT

In this paper, we propose integrating this inductive bias into ViTs, not through an architectural intervention but solely through initialization. The motivation here is to have a ViT that can enjoy strong CNN-like performance when data assets are small, but can still scale to ViTlike performance as the data expands. Our approach is motivated by our empirical results that random impulse filters can achieve commensurate performance to learned filters within a CNN. We improve upon current ViT initialization strategies, which typically rely on empirical heuristics such as using attention weights from pretrained models or focusing on the distribution of attention weights without enforcing structures. Empirical results demonstrate that our method significantly outperforms standard ViT initialization across numerous small and medium-scale benchmarks, including Food-101, CIFAR-10, CIFAR-100, STL-10, Flowers, and Pets, while maintaining comparative performance on large-scale datasets such as ImageNet-1K. Moreover, our initialization strategy can be easily integrated into various transformer-based architectures such as Swin Transformer and MLP-Mixer with consistent improvements in performance.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Structured Initialization for Vision Transformers

Neural Information Processing SystemsJun-14-2026, 02:57:35 GMT

In this paper, we propose integrating this inductive bias into ViTs, not through an architectural intervention but solely through initialization. The motivation here is to have a ViT that can enjoy strong CNN-like performance when data assets are small, but can still scale to ViT-like performance as the data expands. Our approach is motivated by our empirical results that random impulse filters can achieve commensurate performance to learned filters within a CNN. We improve upon current ViT initialization strategies, which typically rely on empirical heuristics such as using attention weights from pretrained models or focusing on the distribution of attention weights without enforcing structures. Empirical results demonstrate that our method significantly outperforms standard ViT initialization across numerous small and medium-scale benchmarks, including Food-101, CIFAR-10, CIFAR-100, STL-10, Flowers, and Pets, while maintaining comparative performance on large-scale datasets such as ImageNet-1K. Moreover, our initialization strategy can be easily integrated into various transformer-based architectures such as Swin Transformer and MLP-Mixer with consistent improvements in performance.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Training Your Image Restoration Network Better with Random Weight Network as Optimization Function

Neural Information Processing SystemsApr-24-2026, 06:51:12 GMT

The blooming progress made in deep learning-based image restoration has been largely attributed to the availability of high-quality, large-scale datasets and advanced network structures. However, optimization functions such as L1 and L2 are still de facto. In this study, we propose to investigate new optimization functions to improve image restoration performance. Our key insight is that "random weight network can be acted as a constraint for training better image restoration networks". However, not all random weight networks are suitable as constraints.

artificial intelligence, machine learning, random weight network, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Lerch, Sacha, Bowles, Joseph, Puig, Ricard, Armengol, Erik, Holmes, Zoë, Thanasilp, Supanut

arXiv.org Machine LearningMar-17-2026

Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli-$Z$ correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

artificial intelligence, initialization, machine learning, (18 more...)

arXiv.org Machine Learning

2603.14576

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Government (0.45)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

043f0503c4f652c737add3690aa5d12c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 21:53:48 GMT

computer vision and pattern recognition, optimization function, random weight network, (10 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Devansh Arpit, Víctor Campos, Yoshua Bengio

Neural Information Processing SystemsFeb-14-2026, 19:36:49 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, initialization, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

f52db9f7c0ae7017ee41f63c2a7353bc-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:10:59 GMT

feedforward dim 512, representation dim 256, transformer sublayer, (12 more...)

Neural Information Processing Systems

Genre: Overview (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

419f72cbd568ad62183f8132a3605a2a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 23:35:24 GMT

acquisition function, implementation, optimization, (14 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
(2 more...)

Add feedback

If You Want to Be Robust, Be Wary of Initialization

Neural Information Processing SystemsFeb-10-2026, 01:35:23 GMT

We introduce a theoretical framework bridging the connection between initialization strategies and a network's resilience to adversarial perturbations.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: