AITopics | rcd

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

Neural Information Processing SystemsMar-17-2026, 12:30:10 GMT

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Stochastic Spectral and Conjugate Descent Methods

Neural Information Processing SystemsMar-17-2026, 02:04:42 GMT

The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.

artificial intelligence, name change, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Stochastic Spectral and Conjugate Descent Methods

Dmitry Kovalev, Peter Richtarik, Eduard Gorbunov, Elnur Gasanov

Neural Information Processing SystemsFeb-15-2026, 01:52:25 GMT

Neural Information Processing Systems http://nips.cc/

descent, probability, rcd, (17 more...)

Neural Information Processing Systems

Country:

Asia > Russia (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

RootCauseAnalysisofFailuresinMicroservices throughCausalDiscovery

Neural Information Processing SystemsFeb-11-2026, 21:57:46 GMT

Our solution is application agnostic and relies only on the data collected for diagnosis. For the evaluation, we compare the proposed solution with amodified version of the PC algorithm and the state-of-the-art for root cause analysis.

artificial intelligence, cloud computing, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Cloud Computing (0.68)

Add feedback

Causal Intervention Sequence Analysis for Fault Tracking in Radio Access Networks

Shi, Chenhua, Philip, Joji, Bandyopadhyay, Subhadip, Choudhury, Jayanta

arXiv.org Artificial IntelligenceNov-25-2025

To keep modern Radio Access Networks (RAN) running smoothly, operators need to spot the real-world triggers behind Service-Level Agreement (SLA) breaches well before customers feel them. We introduce an AI/ML pipeline that does two things most tools miss: (1) finds the likely root-cause indicators and (2) reveals the exact order in which those events unfold. We start by labeling network data: records linked to past SLA breaches are marked `abnormal', and everything else `normal'. Our model then learns the causal chain that turns normal behavior into a fault. In Monte Carlo tests the approach pinpoints the correct trigger sequence with high precision and scales to millions of data points without loss of speed. These results show that high-resolution, causally ordered insights can move fault management from reactive troubleshooting to proactive prevention.

data mining, experiment, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.17505

Genre: Research Report > New Finding (0.88)

Industry:

Telecommunications (0.90)
Health & Medicine > Pharmaceuticals & Biotechnology (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

Neural Information Processing SystemsNov-21-2025, 14:17:35 GMT

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.

descent outperform randomized coordinate descent, deterministic order, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo A. Parrilo

Neural Information Processing SystemsNov-21-2025, 04:32:09 GMT

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Stochastic Spectral and Conjugate Descent Methods

Neural Information Processing SystemsNov-20-2025, 23:11:06 GMT

The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.

name change, spectral and conjugate descent method, stochastic spectral, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Stochastic Spectral and Conjugate Descent Methods

Dmitry Kovalev, Peter Richtarik, Eduard Gorbunov, Elnur Gasanov

Neural Information Processing SystemsNov-20-2025, 20:53:41 GMT

An increasing array of learning and training tasks reduce to optimization problem in very large dimensions. The state-of-the-art algorithms in this regime are based on randomized coordinate descent (RCD) . V arious acceleration strategies were proposed for RCD in the literature in recent years, based on techniques such as Nesterov's momentum [

artificial intelligence, optimization problem, probability, (19 more...)

Neural Information Processing Systems

Country:

Asia > Russia (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models

Hamdani, Rajaa El, Haffoudhi, Samy, Holzenberger, Nils, Suchanek, Fabian, Bonald, Thomas, Malliaros, Fragkiskos D.

arXiv.org Artificial IntelligenceSep-30-2025

Language models (LMs) encode substantial factual knowledge, but often produce answers judged as incorrect. We hypothesize that many of these answers are actually correct, but are expressed in alternative surface forms that are dismissed due to an overly strict evaluation, leading to an underestimation of models' parametric knowledge. We propose Retrieval-Constrained Decoding (RCD), a decoding strategy that restricts model outputs to unique surface forms. We introduce YAGO-QA, a dataset of 19,137 general knowledge questions. Evaluating open-source LMs from 135M to 70B parameters, we show that standard decoding undervalues their knowledge. For instance, Llama-3.1-70B scores only 32.3% F1 with vanilla decoding but 46.0% with RCD. Similarly, Llama-3.1-8B reaches 33.0% with RCD, outperforming the larger model under vanilla decoding. We publicly share the code and dataset at https://github.com/Rajjaa/disambiguated-LLM.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.23417

Country: