Goto

Collaborating Authors

 Edmonton


A Universal Primal-Dual Convex Optimization Framework

Neural Information Processing Systems

We propose a new primal-dual algorithmic framework for a prototypical constrained convex optimization template. The algorithmic instances of our framework are universal since they can automatically adapt to the unknown H older continuity degree and constant within the dual formulation. They are also guaranteed to have optimal convergence rates in the objective residual and the feasibility gap for each H older smoothness degree. In contrast to existing primal-dual algorithms, our framework avoids the proximity operator of the objective function. We instead leverage computationally cheaper, Fenchel-type operators, which are the main workhorses of the generalized conditional gradient (GCG)-type methods. In contrast to the GCG-type methods, our framework does not require the objective function to be differentiable, and can also process additional general linear inclusion constraints, while guarantees the convergence rate on the primal problem.








Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games

Neural Information Processing Systems

For common notions of correlated equilibrium in extensive-form games, computing an optimal ( e.g., welfare-maximizing) equilibrium is NP-hard. Other equilibrium notions-- communication [11] and certification [12] equilibria--augment the game with a mediator that has the power to both send and receive messages to and from the players--and, in particular, to remember the messages. In this paper, we investigate both notions in extensive-form games from a computational lens. We show that optimal equilibria in both notions can be computed in polynomial time, the latter under a natural additional assumption known in the literature. Our proof works by constructing a mediator-augmented game of polynomial size that explicitly represents the mediator's decisions and actions.


Average-Reward Learning and Planning with Options Yi Wan, Abhishek Naik, Richard S. Sutton {wan6,anaik1,rsutton }@ualberta.ca University of Alberta, Amii

Neural Information Processing Systems

We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms. Our algorithms and convergence proofs extend those recently developed by Wan, Naik, and Sutton.