logt
- Asia > China > Beijing > Beijing (0.05)
- Africa > Middle East > Algeria > Sétif Province > Sétif (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Near-OptimalNo-RegretLearningDynamicsfor GeneralConvexGames
A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces--such as normal-form and extensive-form games. The question as to whether O(polylogT) regret bounds can be obtained for general convex and compact strategy sets--which occur in many fundamental models in economics and multiagent systems--while retaining efficient strategy updates is an importantquestion.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Hawaii (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Minnesota (0.05)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Africa > Mali (0.04)
OnlineConvexOptimization withContinuousSwitchingConstraint
In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the overall switching cost. We first investigate the hardness of the problem, and provide a lower bound of orderΩ( T)whentheswitchingcostbudgetS = Ω( T),andΩ(min{T/S,T}) whenS = O( T), where T is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to thecumulative switchingcostofthe playerincurredso farbasedonthe orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)