Goto

Collaborating Authors

 logt




Oracle-EfficientAlgorithmsfor OnlineLinearOptimizationwithBanditFeedback

Neural Information Processing Systems

We propose computationally efficient algorithms foronline linear optimization with bandit feedback, in which a player chooses anaction vectorfrom a given (possibly infinite) setA Rd, and then suffers a loss that can be expressed as a linear function in action vectors.


Near-OptimalNo-RegretLearningDynamicsfor GeneralConvexGames

Neural Information Processing Systems

A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces--such as normal-form and extensive-form games. The question as to whether O(polylogT) regret bounds can be obtained for general convex and compact strategy sets--which occur in many fundamental models in economics and multiagent systems--while retaining efficient strategy updates is an importantquestion.




Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

Arun Verma, Manjesh Hanawal, Arun Rajkumar, Raman Sankaran

Neural Information Processing Systems

The problem is challenging because the loss distribution and threshold value of each arm are unknown. We study this novel setting by establishing its'equivalence' to Multiple-Play Multi-Armed Bandits (MP-MAB) andCombinatorial Semi-Bandits.


OnlineConvexOptimization withContinuousSwitchingConstraint

Neural Information Processing Systems

In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the overall switching cost. We first investigate the hardness of the problem, and provide a lower bound of orderΩ( T)whentheswitchingcostbudgetS = Ω( T),andΩ(min{T/S,T}) whenS = O( T), where T is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to thecumulative switchingcostofthe playerincurredso farbasedonthe orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.



d1588e685562af341ff2448de4b674d1-Paper.pdf

Neural Information Processing Systems

However,existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters.