AITopics | primal

CalibrateandBoostLogicalExpressivenessofGNN OverMulti-RelationalandTemporalGraphs

Neural Information Processing SystemsFeb-17-2026, 06:53:29 GMT

Thistransformation enablesR2-GNN to effectively capture anyFOC2 classifiers when applied to the "transformed" inputgraph. It cannot answerwhich Boolean node classifier can be expressed by GNNs.

artificial intelligence, graph, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 04:22:03 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

TikhonovRegularizationisOptimalTransportRobust underMartingaleConstraints

Neural Information Processing SystemsFeb-9-2026, 17:47:50 GMT

Regularization is an important tool in machine learning which is used in, for instance, reducing overfitting[23].

artificial intelligence, constraint, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

26d4b4313a7e5828856bc0791fca39a2-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 22:45:03 GMT

phase transition, probability, transition, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.52)

Add feedback

Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs

Defazio, Aaron, Mishchenko, Konstantin, Raman, Parameswaran, Shi, Hao-Jun Michael, Xiao, Lin

arXiv.org Machine LearningDec-22-2025

We propose Generalized Primal Averaging (GPA), an extension of Nesterov's method in its primal averaging formulation that addresses key limitations of recent averaging-based optimizers such as single-worker DiLoCo and Schedule-Free (SF) in the non-distributed setting. These two recent algorithmic approaches improve the performance of base optimizers, such as AdamW, through different iterate averaging strategies. Schedule-Free explicitly maintains a uniform average of past weights, while single-worker DiLoCo performs implicit averaging by periodically aggregating trajectories, called pseudo-gradients, to update the model parameters. However, single-worker DiLoCo's periodic averaging introduces a two-loop structure, increasing its memory requirements and number of hyperparameters. GPA overcomes these limitations by decoupling the interpolation constant in the primal averaging formulation of Nesterov. This decoupling enables GPA to smoothly average iterates at every step, generalizing and improving upon single-worker DiLoCo. Empirically, GPA consistently outperforms single-worker DiLoCo while removing the two-loop structure, simplifying hyperparameter tuning, and reducing its memory overhead to a single additional buffer. On the Llama-160M model, GPA provides a 24.22% speedup in terms of steps to reach the baseline (AdamW's) validation loss. Likewise, GPA achieves speedups of 12% and 27% on small and large batch setups, respectively, to attain AdamW's validation accuracy on the ImageNet ViT workload. Furthermore, we prove that for any base optimizer with regret bounded by $O(\sqrt{T})$, where $T$ is the number of iterations, GPA can match or exceed the convergence guarantee of the original optimizer, depending on the choice of interpolation constants.

diloco, formulation, optimizer, (15 more...)

arXiv.org Machine Learning

2512.17131

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Primal: A Unified Deterministic Framework for Quasi-Orthogonal Hashing and Manifold Learning

Khasia, Vladimer

arXiv.org Artificial IntelligenceNov-27-2025

We present Primal, a deterministic feature mapping framework that harnesses the number-theoretic independence of prime square roots to construct robust, tunable vector representations. Diverging from standard stochastic projections (e.g., Random Fourier Features), our method exploits the Besicovitch property to create irrational frequency modulations that guarantee infinite non-repeating phase trajectories. We formalize two distinct algorithmic variants: (1) StaticPrime, a sequence generation method that produces temporal position encodings empirically approaching the theoretical Welch bound for quasi-orthogonality; and (2) DynamicPrime, a tunable projection layer for input-dependent feature mapping. A central novelty of the dynamic framework is its ability to unify two disparate mathematical utility classes through a single scaling parameter σ. In the low-frequency regime, the method acts as an isometric kernel map, effectively linearizing non-convex geometries (e.g., spirals) to enable high-fidelity signal reconstruction and compressive sensing. Conversely, the high-frequency regime induces chaotic phase wrapping, transforming the projection into a maximum-entropy one-way hash suitable for Hyperdimensional Computing and privacy-preserving Split Learning. Empirical evaluations demonstrate that our framework yields superior orthogonality retention and distribution tightness compared to normalized Gaussian baselines, establishing it as a computationally efficient, mathematically rigorous alternative to random matrix projections. The code is available at https://github.com/VladimerKhasia/primal

artificial intelligence, machine learning, regime, (18 more...)

arXiv.org Artificial Intelligence

2511.20839

Genre: Research Report (0.82)

Industry: Education (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Schedulers for Schedule-free: Theoretically inspired hyperparameters

Pun, Yuen-Man, Buchholz, Matthew, Gower, Robert M.

arXiv.org Artificial IntelligenceNov-12-2025

The recently proposed schedule-free method has been shown to achieve strong performance when hyperparameter tuning is limited. The current theory for schedule-free only supports a constant learning rate, where-as the implementation used in practice uses a warm-up schedule. We show how to extend the last-iterate convergence theory of schedule-free to allow for any scheduler, and how the averaging parameter has to be updated as a function of the learning rate. We then perform experiments showing how our convergence theory has some predictive power with regards to practical executions on deep neural networks, despite that this theory relies on assuming convexity. When applied to the warmup-stable-decay (wsd) schedule, our theory shows the optimal convergence rate of $\mathcal{O}(1/\sqrt{T})$. We then use convexity to design a new adaptive Polyak learning rate schedule for schedule-free. We prove an optimal anytime last-iterate convergence for our new Polyak schedule, and show that it performs well compared to a number of baselines on a black-box model distillation task.

artificial intelligence, convergence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.07767

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

cd687a58a13b673eea3fc1b2e4944cf7-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 07:49:05 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Support vector machines and linear regression coincide with very high-dimensional features

Neural Information Processing SystemsOct-3-2025, 00:15:58 GMT

The hard-margin SVM is a linear classification model that finds the separating hyperplane that maximizes the minimum margin of error for every training sample.

artificial intelligence, machine learning, transition, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 23:02:14 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose an accelerated proximal block coordinate descent algorithm, describe its application to standard regularized loss minimization problems, and conclude with experiments on a smoothed SVM. On the question of clarity: I found the paper on the whole difficult to follow, with the authors showing a marked preference for writing equations in lieu of explanations. There are also numerous small grammatical errors. I'm not aware of other algorithms that are designed to work on block-coordinate problems (although single-coordinate algorithms are common enough), and have to question the advantage of this formulation, aside from being slightly more general. Given that the application considered in section 4 is single-coordinate (am I correct about this?), it might simplify the presentation to work from a single-coordinate formulation, and merely mention that block-coordinate updates are also possible.

algorithm, experiment, gradient method, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback