AITopics | coatnet

Collaborating Authors

coatnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CoAtNet: MarryingConvolutionandAttention forAllDataSizes

Neural Information Processing SystemsFeb-7-2026, 19:25:16 GMT

However, these approaches are either ad-hoc or focused on injecting a particular property, lacking a systematic understanding of the respective roles of convolution and attentionwhencombined.

artificial intelligence, arxivpreprintarxiv, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

CoAtNet: for Zihang Google {zihangd,hanxiaol,qv

Neural Information Processing SystemsFeb-7-2026, 19:25:12 GMT

artificial intelligence, arxivpreprintarxiv, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Neural Information Processing SystemsDec-23-2025, 20:57:18 GMT

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. To effectively combine the strengths from both architectures, we present CoAtNets(pronounced coat nets), a family of hybrid models built from two key insights: (1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention; (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Experiments show that our CoAtNets achieve state-of-the-art performance under different resource constraints across various datasets: Without extra data, CoAtNet achieves 86.0% ImageNet top-1 accuracy; When pre-trained with 13M images from ImageNet-21K, our CoAtNet achieves 88.56% top-1 accuracy, matching ViT-huge pre-trained with 300M images from JFT-300M while using 23x less data; Notably, when we further scale up CoAtNet with JFT-3B, it achieves 90.88% top-1 accuracy on ImageNet, establishing a new state-of-the-art result.

coatnet, marrying convolution and attention, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.77)

Add feedback

DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection

Shanto, MD Sadik Hossain, Dihan, Mahir Labib, Ghosh, Souvik, Anonto, Riad Ahmed, Chowdhury, Hafijul Hoque, Muhtasim, Abir, Ahsan, Rakib, Hassan, MD Tanvir, Sojib, MD Roqunuzzaman, Hakim, Sheikh Azizul, Rahman, M. Saifur

arXiv.org Artificial IntelligenceJan-27-2025

This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary strengths. Integration of convolution layers and strided attention in MaxViT is well-suited for detecting local features. In contrast, hybrid use of convolution and attention mechanisms in CoAtNet effectively captures multi-scale features. Robust pretraining with masked image modeling of EVA-02 excels at capturing global features. After training, we freeze the parameters of these models and train the classification heads. Finally, a majority voting ensemble is employed to combine the predictions from these models, improving robustness and generalization to unseen scenarios. The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.16704

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report (0.52)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Neural Information Processing SystemsOct-9-2024, 18:03:47 GMT

Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. To effectively combine the strengths from both architectures, we present CoAtNets(pronounced "coat" nets), a family of hybrid models built from two key insights: (1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention; (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Experiments show that our CoAtNets achieve state-of-the-art performance under different resource constraints across various datasets: Without extra data, CoAtNet achieves 86.0% ImageNet top-1 accuracy; When pre-trained with 13M images from ImageNet-21K, our CoAtNet achieves 88.56% top-1 accuracy, matching ViT-huge pre-trained with 300M images from JFT-300M while using 23x less data; Notably, when we further scale up CoAtNet with JFT-3B, it achieves 90.88% top-1 accuracy on ImageNet, establishing a new state-of-the-art result.

coatnet, marrying convolution and attention, top-1 accuracy, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.81)

Add feedback

SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification

Ashraf, S. M. Nabil, Mamun, Md. Adyelullahil, Abdullah, Hasnat Md., Alam, Md. Golam Rabiul

arXiv.org Artificial IntelligenceNov-20-2023

Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays.

dataset, neural network, transformer, (13 more...)

arXiv.org Artificial Intelligence

2311.0775

Country:

Asia > Thailand (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre:

Research Report > Promising Solution (0.88)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping

Jamali, Ali, Roy, Swalpa Kumar, Hong, Danfeng, Atkinson, Peter M, Ghamisi, Pedram

arXiv.org Artificial IntelligenceAug-9-2023

Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.05235

Country:

North America > United States (0.24)
Europe > Germany > Berlin (0.24)
Europe > Austria > Vienna (0.14)
(4 more...)

Genre: Research Report (0.64)

Industry: Law > Real Estate Law (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Google Introduces Families of Neural Networks To Train Faster, SOTA Performance

#artificialintelligenceSep-27-2021, 11:10:10 GMT

Google AI research team recently introduced two families of neural networks for image recognition -- EfficientNetV2 and CoAtNet. While EffcientNetV2 consists of CNNs with a small-scale dataset for faster training efficiency like ImageNet1K (with 1.28 million images), CoAtNet combines convolution and self-attention to achieve higher accuracy on large-scale datasets like ImageNet21 (13 million images) and JFT (3 billion images). As per Google, EfficientNetV2 and CoAtNet are four to ten times faster while achieving SOTA and 90.88 per cent top-1 accuracy on the well-established ImageNet dataset. In addition to this, the team has also released the source code and pretrained models on the Google AutoML GitHub. Training efficiency has become a critical focus for deep learning with neural network models, and training data size grows. For instance, GPT-3 shows remarkable capabilities in few-shot learning, but it needs weeks of training with hundreds and thousands of GPUs, making it difficult to retrain or improve.

accuracy, coatnet, dataset, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Filters

Collaborating Authors

coatnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

20568692db622456cc42a2e853ca21f8-Supplemental.pdf

20568692db622456cc42a2e853ca21f8-Paper.pdf

CoAtNet: MarryingConvolutionandAttention forAllDataSizes

CoAtNet: for Zihang Google {zihangd,hanxiaol,qv

CoAtNet: Marrying Convolution and Attention for All Data Sizes

DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection

CoAtNet: Marrying Convolution and Attention for All Data Sizes

SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification

Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping

Google Introduces Families of Neural Networks To Train Faster, SOTA Performance