AITopics | Statistical Learning

e812af67a942c21dd0104bd929f99da1-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:40:02 GMT

data mining, machine learning, ood data, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(2 more...)

Add feedback

Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision (Appendix)

Neural Information Processing SystemsApr-30-2026, 03:38:28 GMT

B.1 Complexity Analysis Denote the number of nodes and edges in the graph as N and E, the number of latent factors as K, the number of operation choices as |O|, the dimensionality of hidden representations as d. The time complexity of the disentangled super-network is O(K|E|d+K|V|d2), where the computation for each factor is fully parallelizable and amenable to GPU acceleration, and K is usually a small constant. The time complexity of the self-supervised training and contrastive search modules is both O(K2d2). As architectures under different factors share the parameters, the number of learnable parameters is the same as classical graph super-network, i.e., O(|O|d2). Therefore, the complexity of our method is comparable to classical GNAS methods.

artificial intelligence, machine learning, representation, (13 more...)

Neural Information Processing Systems

Country: Europe > Greece (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Scale Your EMA

Neural Information Processing SystemsApr-30-2026, 03:38:07 GMT

Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functional copy of a target model, whose parameters move towards those of its target model according to an Exponential Moving Average (EMA) at a rate parameterized by a momentum hyperparameter. This model EMA can improve the robustness and generalization of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have not considered the optimization of the model EMA when performing scaling, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of a model EMA and demonstrate the rule's validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, a 6 wall-clock time reduction under idealized hardware settings.

artificial intelligence, emascaling rule, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Bayesian Learning via Q-Exponential Process

Neural Information Processing SystemsApr-30-2026, 03:22:39 GMT

Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter u Rd, an ℓq penalty term, u q, is usually added to the objective function. What is the probabilistic distribution corresponding to such ℓq penalty? What is the correct stochastic process corresponding to u q when we model functions u Lq? This is important for statistically modeling high-dimensional objects such as images, with penalty to preserve certain properties, e.g.

artificial intelligence, machine learning, q-ep, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.82)

Add feedback

e6bfdd58f1326ff821a1b92743963bdf-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:22:37 GMT

artificial intelligence, machine learning, q-ep, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

e69a9560c450ca76584d9eb37e7f5ae8-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 03:10:45 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

Neural Information Processing SystemsApr-30-2026, 02:55:51 GMT

Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the selfexpressive framework with ℓp-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

e5440ffceaf4831b5f98652b8a27ffde-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:54:14 GMT

machine learning, natural language, target model, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
Europe (0.45)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

e4d3fe32495088805bbbb4f1de63e947-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:42:11 GMT

artificial intelligence, inequality, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Resetting the Optimizer in Deep RL: An Empirical Study

Neural Information Processing SystemsApr-30-2026, 02:41:49 GMT

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the secondorder moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Filters

Collaborating Authors

Statistical Learning

e812af67a942c21dd0104bd929f99da1-Paper-Conference.pdf

Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision (Appendix)

How to Scale Your EMA

Bayesian Learning via Q-Exponential Process

e6bfdd58f1326ff821a1b92743963bdf-Paper-Conference.pdf

e69a9560c450ca76584d9eb37e7f5ae8-Paper-Conference.pdf

Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

e5440ffceaf4831b5f98652b8a27ffde-Paper-Conference.pdf

e4d3fe32495088805bbbb4f1de63e947-Paper-Conference.pdf

Resetting the Optimizer in Deep RL: An Empirical Study