AITopics

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).

artificial intelligence, machine learning, sparsity, (17 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

A Appendix

Neural Information Processing SystemsMay-25-2025, 14:36:24 GMT

A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTML Files Our data collection process begins by considering the 25 most recent Common Crawl It contains webpages spanning from February 2020 to January/February 2023. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Sports > Martial Arts (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(12 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Mobile (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

e2a9256bd816ab9e082dfaa22f1f62a2-Paper-Conference.pdf

Neural Information Processing SystemsMay-25-2025, 14:36:00 GMT

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Gradient-Based Feature Learning under Structured Data, Denny Wu2, Murat A. Erdogdu

Neural Information Processing SystemsMay-25-2025, 14:31:56 GMT

Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.

artificial intelligence, machine learning, probability, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Workflow (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

On the Efficiency of ERM in Feature Learning

Neural Information Processing SystemsMay-25-2025, 14:31:27 GMT

Given a collection of feature maps indexed by a set T, we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing the simplest instance of feature learning, where the model is expected to jointly learn from the data an appropriate feature map and a linear predictor. We start by studying the asymptotic quantiles of the excess risk of sequences of empirical risk minimizers. Remarkably, we show that when the set T is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the associated optimal linear class. We complement this asymptotic result with a non-asymptotic analysis that quantifies the decaying effect of the global complexity of the set T on the excess risk of ERM, and relates it to the size of the sublevel sets of the suboptimality of the feature maps. As an application of our results, we obtain new guarantees on the performance of the best subset selection procedure in sparse linear regression under general assumptions.

artificial intelligence, excess risk, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance

Neural Information Processing SystemsMay-25-2025, 14:31:16 GMT

Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property.

data mining, interpretability method, machine learning, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Supplemental Material

Neural Information Processing SystemsMay-25-2025, 14:28:13 GMT

To strengthen the design rationale for incorporating prompts instead of following recent methods [3] Table 1: Comparisons under all-in-one setting: between the usage of degradation embedding extracted from the Contrastive-learning Based Degradation Encoder (CBDE) of the Airnet [3] Model and the usage of prompt tokens in the PromptIR framework. We show that it is important the prompt block is only used on the decoder side. Table 2: Comparisons under the all-in-one setting: between the usage of the Prompt-block on both the encoder branch and encoder branch with using the prompt block only on the decoder branch. Figure 1: Overview of the Transformer block used in the PromptIR framework. The Transformer block is composed of two sub modules,the Multi Dconv head transposed attention module(MDTA) and the Gated Dconv feed-forward network(GDFN).

artificial intelligence, machine learning, module, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.91)
Information Technology > Sensing and Signal Processing > Image Processing (0.72)

Add feedback

Statistical-Computational Trade-offs for Density Estimation

Neural Information Processing SystemsMay-25-2025, 14:19:13 GMT

Recently [1] gave the first and only known result that achieves sublinear bounds in both the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors. Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.41)

Add feedback

Filters

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e35460304fdf6df523f068a59aaf8829-Paper-Conference.pdf

e352b765e625934ce86919995e2371aa-Paper-Conference.pdf

Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

A Appendix

e2a9256bd816ab9e082dfaa22f1f62a2-Paper-Conference.pdf

Gradient-Based Feature Learning under Structured Data, Denny Wu2, Murat A. Erdogdu

On the Efficiency of ERM in Feature Learning

Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance

Supplemental Material

Statistical-Computational Trade-offs for Density Estimation