Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Yang, Wenhao, Yang, Sifan, Zhang, Lijun
Reflecting the greater significance of recent history over the distant past in non-stationary environments, $λ$-discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $λ$ is given, online gradient descent with an appropriate step size achieves an $O(1/\sqrt{1-λ})$ discounted regret. However, the value of $λ$ is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform $O(\sqrt{\log T/1-λ})$ discounted regret, holding for all values of $λ$ across a continuous interval simultaneously. The basic idea is to maintain multiple OGD instances to handle different discount factors, and aggregate their outputs sequentially by an online prediction algorithm named as Discounted-Normal-Predictor (DNP) (Kapralov and Panigrahy,2010). Our analysis reveals that DNP can combine the decisions of two experts, even when they operate on discounted regret with different discount factors.
May-27-2025
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > China
- Genre:
- Research Report (0.50)
- Technology: