Goto

Collaborating Authors

 Agents


Near-OptimalNo-RegretLearningDynamicsfor GeneralConvexGames

Neural Information Processing Systems

A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces--such as normal-form and extensive-form games. The question as to whether O(polylogT) regret bounds can be obtained for general convex and compact strategy sets--which occur in many fundamental models in economics and multiagent systems--while retaining efficient strategy updates is an importantquestion.




ExplicablePolicySearch

Neural Information Processing Systems

Human teammates often form conscious andsubconscious expectations ofeach other during interaction. Teaming success is contingent on whether such expectations can be met. Similarly,for an intelligent agent tooperate beside ahuman, it must consider the human's expectation of its behavior. Disregarding such expectations can lead to the loss of trust and degraded team performance. A key challenge here is that the human's expectation may not align with the agent's optimal behavior,e.g., duetothehuman'spartial orinaccurate understanding of thetaskdomain.


Importance Resamplingfor Off-policy Prediction

Neural Information Processing Systems

Thoughunbiased, IScanbehigh-variance. Alowervariancealternativeis Weighted IS (WIS). Figure 4: Learning Ratesensitivityplotsinthe Random Walk Markov Chain, withbuffersizen = 15000 andmini-batchsizek = 16.