AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization

Neural Information Processing SystemsJun-18-2026, 09:52:19 GMT

Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of O(1/ϵ6)under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of O(1/ϵ5). Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of O(1/ϵ5)for finding an (nearly) ϵ-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Generalization Bounds for Kolmogorov-Arnold Networks (KANs)and Enhanced KANs with Lower Lipschitz Complexity

Neural Information Processing SystemsJun-18-2026, 08:11:29 GMT

Kolmogorov-Arnold Networks (KANs) have demonstrated remarkable expressive capacity and predictive power in symbolic learning. However, existing generalization errors of KANs primarily focus on approximation errors while neglecting estimation errors, leading to a suboptimal bias-variance trade-off and poor generalization performance. Meanwhile, the unclear generalization mechanism hinders the design of more effective KANs. As the authors of KANs highlighted, they "would like to explore ways to restrict KANs' hypothesis space so that they can achieve good performance." To address these challenges, we explore the generalization mechanism of KANs and design more effective KANs with lower model complexity and better generalization. We define Lipschitz complexity as the first structural measure for deep functions represented by KANs and derive novel generalization bounds based on Lipschitz complexity, establishing a theoretical foundation for understanding their generalization behavior. To reduce Lipschitz complexity and boost the generalization mechanism of KANs, we propose Lipschitz-Enhanced KANs (LipKANs) by integrating the Lip layers and pioneering the L1.5-regularization, contributing to tighter generalization bounds. Empirical experiments validate that the proposed LipKANs enhance the generalization mechanism of KANs when modeling complex distributions. We hope our theoretical insights and proposed LipKANs lay a foundation for the future development of KANs.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

G2M: AGeneralized Gaussian Mirror Method to boost feature selection power

Neural Information Processing SystemsJun-18-2026, 07:58:22 GMT

Recent advances in false discovery rate (FDR)-controlled feature selection methods have improved reliability by effectively limiting false positives, making them wellsuited for complex applications. A popular FDR-controlled framework called data splitting uses the "mirror statistics" to select features. However, we find that the unit variance assumption on mirror statistics could potentially limit the feature selection power. To address this, we generalize the mirror statistics in the Gaussian mirror framework and introduce a new approach called "generalized Gaussian mirror" (G2M), which adaptively learns the variance and forms new test statistics. We demonstrate both theoretically and empirically that the proposed test statistics achieve higher power than those of Gaussian mirror and data splitting. Comparisons with other FDR-controlled frameworks on synthetic, semi-synthetic, and real datasets highlight the superior performance of the G2M method in achieving higher power while maintaining FDR control. These findings suggest the potential for the G2M method for practical applications in real-world problems. Code is available at: https://github.com/skyve2012/G2M.

artificial intelligence, machine learning, statistics, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Transformers are almost optimal metalearners for linear classification

Neural Information Processing SystemsJun-18-2026, 07:37:26 GMT

Transformers have demonstrated impressive in-context learning (ICL) capabilities, raising the question of whether they can serve as metalearners that adapt to new tasks using only a small number of in-context examples, without any further training. While recent theoretical work has studied transformers' ability to perform ICL, most of these analyses do not address the formal metalearning setting, where the objective is to solve a collection of related tasks more efficiently than would be possible by solving each task individually. In this paper, we provide the first theoretical analysis showing that a simplified transformer architecture trained via gradient descent can act as a near-optimal metalearner in a linear classification setting. We consider a natural family of tasks where each task corresponds to a class-conditional Gaussian mixture model, with the mean vectors lying in a shared k-dimensional subspace of Rd. After training on a sufficient number of such tasks, we show that the transformer can generalize to a new task using only eO(k/eR4) in-context examples, where eR denotes the signal strength at test time. This performance (almost) matches that of an optimal learner that knows exactly the shared subspace and significantly outperforms any learner that only has access to the in-context data, which requires Ω(d/eR4) examples to generalize. Importantly, our bounds on the number of training tasks and examples per task needed to achieve this result are independent of the ambient dimension d.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Sketch-Augmented Features Improve Learning Long-Range Dependencies in Graph Neural Networks

Neural Information Processing SystemsJun-18-2026, 07:31:42 GMT

Graph Neural Networks learn on graph-structured data by iteratively aggregating local neighborhood information. While this local message passing paradigm imparts a powerful inductive bias and exploits graph sparsity, it also yields three key challenges: (i) oversquashing of long-range information, (ii) oversmoothing of node representations, and (iii) limited expressive power. In this work we inject randomized global embeddings of node features, which we term Sketched Random Features, into standard GNNs, enabling them to efficiently capture long-range dependencies. The embeddings are unique, distance-sensitive, and topology-agnostic--properties which we analytically and empirically show alleviate the aforementioned limitations when injected into GNNs. Experimental results on real-world graph learning tasks confirm that this strategy consistently improves performance over baseline GNNs, offering both a standalone solution and a complementary enhancement to existing techniques such as graph positional encodings.

artificial intelligence, machine learning, neural network, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Efficient Fuse-and-Refine for Feed-Forward 3DGaussian Splatting

Neural Information Processing SystemsJun-18-2026, 07:18:15 GMT

Recent advances in feed-forward 3DGaussian Splatting have led to rapid improvements in efficient scene reconstruction from sparse views. However, most existing approaches construct Gaussian primitives directly aligned with the pixels in one or more of the input images. This leads to redundancies in the representation when input views overlap and constrains the position of the primitives to lie along the input rays without full flexibility in 3D space. Moreover, these pixel-aligned approaches do not naturally generalize to dynamic scenes, where effectively leveraging temporal information requires resolving both redundant and newly appearing content across frames. To address these limitations, we introduce a novel Fuseand-Refine module that enhances existing feed-forward models by merging and refining the primitives in a canonical 3D space.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback

Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment

Neural Information Processing SystemsJun-18-2026, 07:17:52 GMT

Test-time adaptation (TTA) enhances the zero-shot robustness under distribution shifts by leveraging unlabeled test data during inference. Despite notable advances, several challenges still limit its broader applicability. First, most methods rely on backpropagation or iterative optimization, which limits scalability and hinders real-time deployment. Second, they lack explicit modeling of class-conditional feature distributions. This modeling is crucial for producing reliable decision boundaries and calibrated predictions, but it remains underexplored due to the lack of both source data and supervision at test time.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

Add feedback

Fast exact recovery of noisy matrix from few entries: the infinity norm approach

Neural Information Processing SystemsJun-18-2026, 05:48:30 GMT

The matrix recovery (completion) problem, a central problem in data science, involves recovering a matrix Afrom a relatively small random set of entries. While such a task is generally impossible, it has been shown that one can recover A exactly in polynomial time, with high probability, under three basic and necessary assumptions: (1) the rank of A is very small compared to its dimensions (low rank), (2) A has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large. Various algorithms address this task, including convex optimization by Candes, Recht, and Tao (2009), alternating projection by Hardt and Wooters (2014), and low-rank approximation with gradient descent by Keshavan, Montanari, and Oh (2009, 2010). In applications, Candes and Plan (2009) noted that it is more realistic to assume noisy observations. In such cases, the above approaches provide approximate recovery with small root mean square error, which is difficult to convert into exact recovery.

artificial intelligence, assumption, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Personalized Bayesian Federated Learning with Wasserstein Barycenter Aggregation

Neural Information Processing SystemsJun-18-2026, 05:42:39 GMT

Personalized Bayesian federated learning (PBFL) handles non-i.i.d.

artificial intelligence, machine learning, particle, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.47)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Controlling the Flow: Stability and Convergence for Stochastic Gradient Descent with Decaying Regularization

Neural Information Processing SystemsJun-18-2026, 05:34:34 GMT

The present article studies the minimization of convex, L-smooth functions defined on a separable real Hilbert space. We analyze regularized stochastic gradient descent (reg-SGD), a variant of stochastic gradient descent that uses a Tikhonov regularization with time-dependent, vanishing regularization parameter. We prove strong convergence of reg-SGD to the minimum-norm solution of the original problem without additional boundedness assumptions. Moreover, we quantify the rate of convergence and optimize the interplay between step-sizes and regularization decay. Our analysis reveals how vanishing Tikhonov regularization controls the flow of SGD and yields stable learning dynamics, offering new insights into the design of iterative algorithms for convex problems, including those that arise in ill-posed inverse problems.

artificial intelligence, convergence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback