AITopics | deep residual learning

Multimodal Residual Learning for Visual QA

Neural Information Processing SystemsMar-17-2026, 10:02:11 GMT

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Deep Residual Learning in Spiking Neural Networks

Neural Information Processing SystemsDec-24-2025, 18:13:00 GMT

Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible.

deep residual learning, resnet, spiking neural network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

Multimodal Residual Learning for Visual QA

Neural Information Processing SystemsNov-21-2025, 15:06:48 GMT

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.

multimodal residual learning, name change, residual learning, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

cfa8440d500a6a6867157dfd4eaff66e-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 02:22:36 GMT

artificial intelligence, layer layer index index, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Residual Learning in Spiking Neural Networks

Neural Information Processing SystemsJan-18-2025, 17:38:37 GMT

Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps.

deep residual learning, resnet, spiking neural network, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Advancing Deep Residual Learning by Solving the Crux of Degradation in Spiking Neural Networks

Hu, Yifan, Wu, Yujie, Deng, Lei, Li, Guoqi

arXiv.org Artificial IntelligenceFeb-17-2022

Despite the rapid progress of neuromorphic computing, the inadequate depth and the resulting insufficient representation power of spiking neural networks (SNNs) severely restrict their application scope in practice. Residual learning and shortcuts have been evidenced as an important approach for training deep neural networks, but rarely did previous work assess their applicability to the characteristics of spike-based communication and spatiotemporal dynamics. This negligence leads to impeded information flow and the accompanying degradation problem. In this paper, we identify the crux and then propose a novel residual block for SNNs, which is able to significantly extend the depth of directly trained SNNs, e.g., up to 482 layers on CIFAR-10 and 104 layers on ImageNet, without observing any slight degradation problem. We validate the effectiveness of our methods on both frame-based and neuromorphic datasets, and our SRM-ResNet104 achieves a superior result of 76.02% accuracy on ImageNet, the first time in the domain of directly trained SNNs. The great energy efficiency is estimated and the resulting networks need on average only one spike per neuron for classifying an input sample. We believe our powerful and scalable modeling will provide a strong support for further exploration of SNNs.

deep residual learning, degradation, spiking neural network, (1 more...)

arXiv.org Artificial Intelligence

2201.07209

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Deep Residual Learning for Image Recognition (2015)

#artificialintelligenceMay-15-2021, 21:10:11 GMT

Short summaries (1–2 minutes reading time) to help you (and me) understand and remember important papers/concepts about machine learning and related topics. "If you can't explain is simply, you don't understand it well enough" -- Einstein, maybe.

deep residual learning, image recognition

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)

Add feedback

Multimodal Residual Learning for Visual QA

Kim, Jin-Hwa, Lee, Sang-Woo, Kwak, Donghyun, Heo, Min-Oh, Kim, Jeonghee, Ha, Jung-Woo, Zhang, Byoung-Tak

Neural Information Processing SystemsFeb-14-2020, 05:41:43 GMT

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from visual and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies.

multimodal residual learning, residual learning, visual qa, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Add feedback

Introduction to ResNets – Towards Data Science

#artificialintelligenceJan-26-2019, 01:35:04 GMT

In 2012, Krizhevsky et al. [1] rolled out the red carpet for the Deep Convolutional Neural Network. This was the first time this architecture was more successful that traditional, hand-crafted feature learning on the ImageNet. Their DCNN, named AlexNet, contained 8 neural network layers, 5 convolutional and 3 fully-connected. This laid the foundational for the traditional CNN, a convolutional layer followed by an activation function followed by a max pooling operation, (sometimes the pooling operation is omitted to preserve the spatial resolution of the image). Much of the success of Deep Neural Networks has been accredited to these additional layers.

artificial intelligence, machine learning, skip connection, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Spectrum concentration in deep residual learning: a free probability appproach

Ling, Zenan, Qiu, Robert C.

arXiv.org Machine LearningJul-31-2018

We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully- connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation.

artificial intelligence, machine learning, resnet, (17 more...)

arXiv.org Machine Learning

1807.11694

Country: