convex
SPFL: Sequential Updates with Parallel Aggregation for Enhanced Federated Learning Under Category and Domain Shifts
Federated Learning (FL) has recently emerged as the primary approach to overcoming data silos, enabling collaborative model training without sharing sensitive or proprietary data. Parallel Federated Learning (PFL) aggregates models trained independently on each client's local data, which could prevent the model from converging to the optimal solution due to limited data exposure. In contrast, Sequential Federated Learning (SFL) allows models to traverse client datasets sequentially, enhancing data utilization. However, SFL effectiveness is limited in real-world Non-IID scenarios characterized by category shift (inconsistent class distributions) and domain shift (distribution discrepancies). These shifts cause two critical issues: update order sensitivity, where model performance varies significantly with the sequence of client updates; and catastrophic forgetting, where the model forgets previously learned features when trained on new client data. Therefore, based on SFL, we propose a novel updating framework, SPFL (Sequential updates with Parallel aggregation Federated Learning), that can be integrated into existing PFL methods.
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting--particularly with large (constant) stepsizes--has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems.
Optimal Best Arm Identification under Differential Privacy
Best Arm Identification (BAI) algorithms are deployed in data-sensitive applications, such as adaptive clinical trials or user studies. Driven by the privacy concerns of these applications, we study the problem of fixed-confidence BAI under global Differential Privacy (DP) for Bernoulli distributions. While numerous asymptotically optimal BAI algorithms exist in the non-private setting, a significant gap remains between the best lower and upper bounds in the global DP setting. This work reduces this gap to a small multiplicative constant, for any privacy budget ϵ. First, we provide a tighter lower bound on the expected sample complexity of any δ-correct and ϵ-global DP strategy.
Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization
Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of O(1/ϵ6)under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of O(1/ϵ5). Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of O(1/ϵ5)for finding an (nearly) ϵ-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.
Conformal Risk Training: End-to-End Optimization of Conformal Risk Control
While deep learning models often achieve high predictive accuracy, their predictions typically do not come with any provable guarantees on risk or reliability, which are critical for deployment in high-stakes applications. The framework of conformal risk control (CRC) provides a distribution-free, finite-sample method for controlling the expected value of any bounded monotone loss function and can be conveniently applied post-hoc to any pre-trained deep learning model. However, many realworld applications are sensitive to tail risks, as opposed to just expected loss. In this work, we develop a method for controlling the general class of Optimized CertaintyEquivalent (OCE) risks, a broad class of risk measures which includes as special cases the expected loss (generalizing the original CRC method) and common tail risks like the conditional value-at-risk (CVaR).
Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set
Projection-free first-order methods, e.g., the celebrated Frank-Wolfe (FW) algorithms, have emerged as powerful tools for optimization over simple convex sets such as polyhedra, because of their scalability, fast convergence, and iteration-wise feasibility without costly projections. However, extending these methods effectively to general compact convex sets remains challenging and largely open, as FW methods rely on expensive linear optimization oracles (LOO), while penalty-based methods often struggle with poor feasibility. We tackle this open challenge by presenting Hom-PGD, a novel projection-free method without expensive (optimization) oracles. Our method constructs a homeomorphism between the convex constraint set and a unit ball, transforming the original problem into an equivalent ball-constrained formulation, thus enabling efficient gradient-based optimization while preserving the original problem structure. We prove that Hom-PGD attains optimal convergence rates matching gradient descent with constant step-size to find an ϵ-approximate (stationary) solution: O(log(1/ϵ))for strongly convex objectives, O(ϵ 1) for convex objectives, and O(ϵ 2) for non-convex objectives. Meanwhile, Hom-PGD enjoys a low per-iteration complexity of O(n2), without expensive oracles like LOO or projection, where nis the input size. Our framework further extends to certain non-convex sets, broadening its applicability in practical optimization scenarios with complex constraints. Extensive numerical experiments demonstrate that Hom-PGD achieves comparable convergence rates to state-of-theart projection-free methods, while significantly reducing per-iteration runtime (up to 5 orders of magnitude faster) and thus the total problem-solving time.
Uniform Wrappers: Bridging Concave to Quadratizable Functions in Online Optimization
This paper presents novel contributions to the field of online optimization, particularly focusing on the adaptation of algorithms from concave optimization to more challenging classes of functions. Key contributions include the introduction of uniform wrappers, a class of meta-algorithms that could be used for algorithmic conversions such as converting algorithms for convex optimization into those for quadratizable optimization. Moreover, we propose a guideline that, given a base algorithm Afor concave optimization and a uniform wrapper W, describes how to convert a proof of the regret bound of A in the concave setting into a proof of the regret bound of W(A)for quadratizable setting. Through this framework, the paper demonstrates improved regret guarantees for various classes of DR-submodular functions under zeroth-order feedback. Furthermore, the paper extends zeroth-order online algorithms to bandit feedback and offline counterparts, achieving notable improvements in regret/sample complexity compared to existing approaches.
High-Dimensional Calibration from Swap Regret
We study the online calibration of multi-dimensional forecasts over an arbitrary convex set P Rd relative to an arbitrary norm k k. We connect this with the problem of external regret minimization for online linear optimization, showing that if it is possible to guarantee O( ρT) worst-case regret after T rounds when actions are drawn from P and losses are drawn from the dual k k unit norm ball, then it is also possible to obtain -calibrated forecasts after T = exp(O(ρ/2)) rounds.