learning dynamic
Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games
In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after T repetitions of the game is bounded by O(logT), improving over the prior best bounds of O(log4(T)). At the same time, we guarantee optimal O( T) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning rate, the second-order path lengths of the dynamics up to time T are bounded by O(logT), a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way optimistic regularized learning with the use of self-concordant barriers. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).
Near-OptimalNo-RegretLearningDynamicsfor GeneralConvexGames
A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces--such as normal-form and extensive-form games. The question as to whether O(polylogT) regret bounds can be obtained for general convex and compact strategy sets--which occur in many fundamental models in economics and multiagent systems--while retaining efficient strategy updates is an importantquestion.
Supplementary Information A The principle of least action and the Euler-Lagrange equation Here, we review the principle of least action and the derivation of the Euler-Lagrange equation [
Now, let us derive the differential equation that gives a solution to the variational problem. This condition yields the Euler-Lagrange equation, d dt @ L @ q = @ L @q . Here, we derive the Noether's learning dynamics by applying Noether's theorem to the A general form of the Noether's theorem relates the dynamics of Noether By evaluating the right hand side of Eq. 23, we get e Now, we harness the covariant property of the Lagrangian formulation, i.e., it preserves the form Plugging this expression obtained from the steady-state condition of Eq.27 Here, we ignore the inertia term in Eq. 16, assuming that the mass (learning rate) is finite but small All the experiments were run using the PyTorch code base. We used Tiny ImageNet dataset to generate all the empirical figures in this work. The key hyperparameters we used are listed with each figure.
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players.
Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning
Without relevant human priors, neural networks may learn uninterpretable features. We propose Dynamics of Attention for Focus Transition (DAFT) as a human prior for machine reasoning. DAFT is a novel method that regularizes attention-based reasoning by modelling it as a continuous dynamical system using neural ordinary differential equations. As a proof of concept, we augment a state-of-the-art visual reasoning model with DAFT. Our experiments reveal that applying DAFT yields similar performance to the original model while using fewer reasoning steps, showing that it implicitly learns to skip unnecessary steps. We also propose a new metric, Total Length of Transition (TLT), which represents the effective reasoning step size by quantifying how much a given model's focus drifts while reasoning about a question. We show that adding DAFT results in lower TLT, demonstrating that our method indeed obeys the human prior towards shorter reasoning paths in addition to producing more interpretable attention maps.