AITopics | model capacity

The φCurve: The Shape of Generalization through the Lens of Norm-based Capacity Control

Neural Information Processing SystemsJun-21-2026, 08:31:06 GMT

Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on parameter count typically fail to account for these empirical observations. To tackle this challenge, we consider norm-based capacity measures and develop our study for random features based estimators, widely used as simplified theoretical models for more complex networks. In this context, we provide a precise characterization of how the estimator's norm concentrates and how it governs the associated test error. Our results show that the predicted learning curve admits a phase transition from under-to over-parameterization, but no double descent behavior. This confirms that more classical U-shaped behavior is recovered considering appropriate capacity measures based on models norms rather than size. From a technical point of view, we leverage deterministic equivalence as the key tool and further develop new deterministic quantities which are of independent interest.

artificial intelligence, machine learning, regime, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government > Regional Government (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLMFine-tuning

Neural Information Processing SystemsJun-18-2026, 13:13:01 GMT

Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) enhance model capacity at the cost of more & under-utilized parameters. To address these limitations, we propose Structural Mixture of Residual Experts (S'MoRE), a novel framework that seamlessly integrates the efficiency of LoRA with the flexibility of MoE. Conceptually, S'MoRE employs hierarchical low-rank decomposition of expert weights, yielding residuals of varying orders interconnected in a multi-layer structure.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

Fair Deepfake Detectors Can Generalize

Neural Information Processing SystemsJun-17-2026, 02:46:16 GMT

Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inversepropensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three crossdomain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

UMA: A Family of Universal Models for Atoms

Neural Information Processing SystemsJun-14-2026, 01:43:29 GMT

The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, we present a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning

Neural Information Processing SystemsJun-12-2026, 20:02:02 GMT

Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are efficient but lack flexibility, while Mixture-of-Experts (MoE) enhance model capacity at the cost of more & under-utilized parameters. To address these limitations, we propose Structural Mixture of Residual Experts (S'MoRE), a novel framework that seamlessly integrates the efficiency of LoRA with the flexibility of MoE. Conceptually, S'MoRE employs hierarchical low-rank decomposition of expert weights, yielding residuals of varying orders interconnected in a multi-layer structure.

artificial intelligence, large language model, natural language, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

2f00ecd787b432c1d36f3de9800728eb-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 07:44:39 GMT

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

20568692db622456cc42a2e853ca21f8-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 01:34:03 GMT

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

20568692db622456cc42a2e853ca21f8-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 01:34:00 GMT

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

0799492e7be38b66d10ead5e8809616d-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:10:11 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.68)
Europe (0.67)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Uncovering Neural Scaling Laws in Molecular Representation Learning

Neural Information Processing SystemsApr-24-2026, 07:36:31 GMT

Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing modelcentric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: