Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates

Jan-18-2025, 11:01:44 GMT–Neural Information Processing Systems

Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a \mathcal{O}(1/\sqrt{t}) convergence rate. We compare our proposed methods with competing methods on various applications.

adaptive learning rate, operator splitting, stochastic gradient, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 11:01:44 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Mathematical & Statistical Methods (0.92)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.92)