AITopics | resnet50

Collaborating Authors

resnet50

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Neural Information Processing SystemsApr-25-2026, 13:15:12 GMT

Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization

Neural Information Processing SystemsApr-24-2026, 19:33:09 GMT

This paper presents a new algorithm for domain generalization (DG), test-time template adjuster (T3A), aiming to robustify a model to unknown distribution shift. Unlike existing methods that focus on training phase, our method focuses test phase, i.e., correcting its prediction by itself during test time. Specifically, T3A adjusts a trained linear classifier (the last layer of deep neural networks) with the following procedure: (1) compute a pseudo-prototype representation for each class using online unlabeled data augmented by the base classifier trained in the source domains, (2) and then classify each sample based on its distance to the pseudoprototypes. T3A is back-propagation-free and modifies only the linear layer; therefore, the increase in computational cost during inference is negligible and avoids the catastrophic failure might caused by stochastic optimization. Despite its simplicity, T3A can leverage knowledge about the target domain by using off-the-shelf test-time data and improve performance. We tested our method on four domain generalization benchmarks, namely PACS, VLCS, OfficeHome, and TerraIncognita, along with various backbone networks including ResNet18, ResNet50, Big Transfer (BiT), Vision Transformers (ViT), and MLP-Mixer. The results show T3A stably improves performance on unseen domains across choices of backbone networks, and outperforms existing domain generalization methods.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Diffused Redundancy

Neural Information Processing SystemsApr-24-2026, 17:51:46 GMT

A.1 CKADefinition In all our evaluations we use CKA with a linear kernel [24] which essentially amounts to the following steps: A.2 Additional CKA results Fig 9 shows CKA comparison between randomly chosen parts of the layer and the full layer for different kinds of ResNet50. We observe that even ResNet50 trained with MRL loss shows a significant amount of diffused redundancy. Figure 9: [Comparison of Diffused Redundancy in MRL vs other losses, through the lens of CKA] We see a similar trend as reported in Fig 7 in the main paper, where even the MRL model shows a significant amount of diffused redundancy despite being explicitly trained to instead have structured redundancy. The amount of diffused redundancy however is much lesser than the resnets trained using the standard loss and adv. Here we list the sources of weights for the various pre-trained models used in our experiments: ResNet18 trained on ImageNet1k using standard loss: taken from timmv0.6.1.

artificial intelligence, diffused redundancy, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Diffused Redundancy in Pre-trained Representations

Neural Information Processing SystemsApr-24-2026, 17:51:42 GMT

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on 20% of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within 5% of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both ImageNet1k and ImageNet21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences.

artificial intelligence, machine learning, redundancy, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.65)

Industry:

Government (1.00)
Health & Medicine (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Improving robustness to corruptions with multiplicative weight perturbations

Neural Information Processing SystemsMar-19-2026, 18:42:54 GMT

Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16, ViT-B/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23.7% which is comparable to ResNet50 without extensive data augmentations.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Neural Information Processing SystemsFeb-16-2026, 23:50:31 GMT

In this work, we analyze stability as a property for reliable feature attribution methods.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Industry:

Law (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

a4316bb210a59fb7aafeca5dd21c2703-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 07:24:28 GMT

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(4 more...)

Add feedback

76d2f8e328e1081c22a77ca0fa330ca5-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 21:30:41 GMT

artificial intelligence, machine learning, visualization, (18 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Supplemental Material for AC-GC: Lossy Activation Compression with Guaranteed Convergence

Neural Information Processing SystemsFeb-11-2026, 16:11:02 GMT

The appendices of this supplemental material are focused on providing detailed proofs (Appendix A), per-layer derivations for activation errors (Appendix B), algorithm and implementationdetails(AppendixC),datasetsandhyperparameters(AppendixD),extended experimental data (Appendix E) and additional experiments (Appendix F) to accompany the main paper. A code example and trained models are available for CIFAR10/ResNet50 by accessing https://github.com/rdevans0/acgc. L and η depend on the model being trained and dataset, and are thus problem-dependent constants. Preliminary on Separation of Norms Given two, independent random vectorsA= (an) RN and B =(bn) RN, whereE[bn]=0 n. Given f which obeys (4), and a convex functionD( X) which bounds the gradient error from above for all X, θ, and X; provided that D( X) e2V2 the variance of the compressed gradients satisfies E[kˆ θf(θ,Xnt)k2] (1+e2)V2 (16) Proof.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: