rcd
When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
Stochastic Spectral and Conjugate Descent Methods
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
- Asia > Russia (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (3 more...)
- North America > United States (0.04)
- Asia > India (0.04)
Causal Intervention Sequence Analysis for Fault Tracking in Radio Access Networks
Shi, Chenhua, Philip, Joji, Bandyopadhyay, Subhadip, Choudhury, Jayanta
To keep modern Radio Access Networks (RAN) running smoothly, operators need to spot the real-world triggers behind Service-Level Agreement (SLA) breaches well before customers feel them. We introduce an AI/ML pipeline that does two things most tools miss: (1) finds the likely root-cause indicators and (2) reveals the exact order in which those events unfold. We start by labeling network data: records linked to past SLA breaches are marked `abnormal', and everything else `normal'. Our model then learns the causal chain that turns normal behavior into a fault. In Monte Carlo tests the approach pinpoints the correct trigger sequence with high precision and scales to millions of data points without loss of speed. These results show that high-resolution, causally ordered insights can move fault management from reactive troubleshooting to proactive prevention.
- Telecommunications (0.90)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.40)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Stochastic Spectral and Conjugate Descent Methods
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
Stochastic Spectral and Conjugate Descent Methods
Dmitry Kovalev, Peter Richtarik, Eduard Gorbunov, Elnur Gasanov
An increasing array of learning and training tasks reduce to optimization problem in very large dimensions. The state-of-the-art algorithms in this regime are based on randomized coordinate descent (RCD) . V arious acceleration strategies were proposed for RCD in the literature in recent years, based on techniques such as Nesterov's momentum [
- Asia > Russia (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (3 more...)
Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Hamdani, Rajaa El, Haffoudhi, Samy, Holzenberger, Nils, Suchanek, Fabian, Bonald, Thomas, Malliaros, Fragkiskos D.
Language models (LMs) encode substantial factual knowledge, but often produce answers judged as incorrect. We hypothesize that many of these answers are actually correct, but are expressed in alternative surface forms that are dismissed due to an overly strict evaluation, leading to an underestimation of models' parametric knowledge. We propose Retrieval-Constrained Decoding (RCD), a decoding strategy that restricts model outputs to unique surface forms. We introduce YAGO-QA, a dataset of 19,137 general knowledge questions. Evaluating open-source LMs from 135M to 70B parameters, we show that standard decoding undervalues their knowledge. For instance, Llama-3.1-70B scores only 32.3% F1 with vanilla decoding but 46.0% with RCD. Similarly, Llama-3.1-8B reaches 33.0% with RCD, outperforming the larger model under vanilla decoding. We publicly share the code and dataset at https://github.com/Rajjaa/disambiguated-LLM.
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > China > Hong Kong (0.04)
- (17 more...)
- Media (1.00)
- Aerospace & Defense (0.93)
- Leisure & Entertainment > Sports > Soccer (0.69)