AITopics | efficientformer

5452ad8ee6ea6e7dc41db1cbd31ba0b8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 00:46:06 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

EfficientFormer: Vision Transformers at MobileNet Speed

Neural Information Processing SystemsDec-24-2025, 05:32:18 GMT

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?

efficientformer, name change, vision transformer, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Appendix A Latency Driven Slimming Algorithm

Neural Information Processing SystemsAug-14-2025, 21:39:24 GMT

We provide the details of the proposed latency-driven fast slimming in Alg. 1. Formulations of the Our major conclusions and speed analysis can be found in Sec. 3 and Figure 1. Compared to non-overlap large-kernel patch embedding (V5 in Tab. MHSA with the global receptive field is an essential contribution to model performance. By comparing V1 and V2 in Tab. 3, we can observe that the GN We explore ReLU and HardSwish (V3 and V4 in Tab. 3) in addition to GeLU We draw a conclusion that the activation function can be selected on a case-by-case basis depending on the specific hardware and compiler. In this work, we use GeLU to provide better performance than ReLU while executing faster.

efficientformer, efficientformer-l1, embed, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

EfficientFormer: Vision Transformers at MobileNet Speed Y anyu Li

Neural Information Processing SystemsAug-14-2025, 21:39:21 GMT

Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm. Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer.

arxiv preprint arxiv, transformer, vision transformer, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

EfficientFormer: Vision Transformers at MobileNet Speed

Neural Information Processing SystemsOct-11-2024, 02:35:29 GMT

Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?

efficientformer, mobilenet speed, vision transformer, (6 more...)

Neural Information Processing Systems

Genre: Play > Prospect (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping

Jamali, Ali, Roy, Swalpa Kumar, Hong, Danfeng, Atkinson, Peter M, Ghamisi, Pedram

arXiv.org Artificial IntelligenceAug-9-2023

Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.05235

Country:

North America > United States (0.24)
Europe > Germany > Berlin (0.24)
Europe > Austria > Vienna (0.14)
(4 more...)

Genre: Research Report (0.64)

Industry: Law > Real Estate Law (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Asif Razzaq on LinkedIn: #tech #ai #artificialintelligence

#artificialintelligenceJun-9-2022, 05:25:09 GMT

Snap and Northeastern University Researchers Propose EfficientFormer: A Vision Transformer That Runs As Fast As MobileNet While Maintaining High Performance In natural language processing, the Transformer is a unique design that seeks to solve sequence-to-sequence tasks while also resolving long-range dependencies. Vision Transformers (ViT) have demonstrated excellent results on computer vision benchmarks in recent years. On the other hand, they are usually times slower than lightweight convolutional networks because of the large number of parameters and model architecture, such as the attention mechanism. As a result, deploying ViT for real-time applications is difficult, especially on hardware with limited resources, such as mobile devices. Snap Inc. and Northeastern University collaborated on a new study that answers this fundamental question and suggests a new ViT paradigm.

artificialintelligence, asif razzaq, vision transformer, (6 more...)

#artificialintelligence

Technology: