AITopics | identity mapping

2f4b6febe0b70805c3be75e5d6a66918-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:57:51 GMT

artificial intelligence, machine learning, subset, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Neural Information Processing SystemsMar-17-2026, 16:47:23 GMT

In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called identity mapping. We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang

Neural Information Processing SystemsMar-15-2026, 17:25:14 GMT

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixellevel, are diverging. Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixellevel or region-level predicting tasks, which may require very deep features with high resolution. Towards this goal, we design a fish-like network, called FishNet. In FishNet, the information of all resolutions is preserved and refined for the final task. Besides, we observe that existing works still cannot directly propagate the gradient information from deep layers to shallow layers. Our design can better handle this problem. Extensive experiments have been conducted to demonstrate the remarkable performance of the FishNet. In particular, on ImageNet-1k, the accuracy of FishNet is able to surpass the performance of DenseNet and ResNet with fewer parameters. FishNet was applied as one of the modules in the winning entry of the COCO Detection 2018 challenge.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

afe434653a898da20044041262b3ac74-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 17:29:34 GMT

neural network, resnet, spiking resnet, (15 more...)

Neural Information Processing Systems

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

a57e8915461b83adefb011530b711704-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 11:32:14 GMT

attention matrix, matrix, transformer, (12 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FactorGraphNeuralNet--SupplementaryFile AProof of propositions

Neural Information Processing SystemsFeb-8-2026, 15:35:06 GMT

First we provide Lemma 8, which will be used in the proof of Proposition 2 and 4. Lemma 8. Given n non-negative feature vectors fi =[fi0,fi1,...,fim], where i=1,...,n, there exists n matrices Qi with shape nm m and n vector ˆfi =QifTi, s.t. Proposition 2. A factor graph G =(V,C,E) with variable log potentialsθi(xi) and factor log potentials ϕc(xc) can be converted to a factor graph G0 with the same variable potentials and the decomposed log-potentials ϕic(xi,zc) using a one-layer FGNN. Without loss of generality, we assume that logφc(xc)>1. Then for each i the item θic(xi,zc) in (9) have kn+1 entries, and each entry is either a scaled entry of the vectorgc or arbitrary negative number less than maxxcθc(xc). Thusifweorganize θic(xi,zc) asalength-kn+1 vector fic, thenwedefinea kn+1 kn matrix Qci, where if and only if thelth entry of fic is set to the mth entry of gc multiplied by 12 1/|s(c)|, the entry of Qci in lth row, mth column will be set to 1/|s(c)|; all the other entries of Qci is set to some negative number smaller than maxxcθc(xc).

artificial intelligence, fgnn, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Neural Information Processing SystemsNov-21-2025, 15:56:48 GMT

In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called identity mapping. We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.

convergence analysis, name change, two-layer neural network, (7 more...)

Neural Information Processing Systems

Technology: