cumulative payoff
Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs
Han Shao, Xiaotian Yu, Irwin King, Michael R. Lyu
In linear stochastic bandits, it is commonly assumed that pa yoffs are with sub-Gaussian noises. In this paper, under a weaker assumption on noises, we study the problem of lin ear stochastic b andits with he avy-t ailed payoffs (LinBET), where the distributions have finite moments of order 1+ ฯต,f o rs o m e ฯต (0, 1] .W e rigorously analyze the regret lower bound of LinBET as โฆ( T
Selection pressure/Noise driven cooperative behaviour in the thermodynamic limit of repeated games
Consider the scenario where an infinite number of players (i.e., the \textit{thermodynamic} limit) find themselves in a Prisoner's dilemma type situation, in a \textit{repeated} setting. Is it reasonable to anticipate that, in these circumstances, cooperation will emerge? This paper addresses this question by examining the emergence of cooperative behaviour, in the presence of \textit{noise} (or, under \textit{selection pressure}), in repeated Prisoner's Dilemma games, involving strategies such as \textit{Tit-for-Tat}, \textit{Always Defect}, \textit{GRIM}, \textit{Win-Stay, Lose-Shift}, and others. To analyze these games, we employ a numerical Agent-Based Model (ABM) and compare it with the analytical Nash Equilibrium Mapping (NEM) technique, both based on the \textit{1D}-Ising chain. We use \textit{game magnetization} as an indicator of cooperative behaviour. A significant finding is that for some repeated games, a discontinuity in the game magnetization indicates a \textit{first}-order \textit{selection pressure/noise}-driven phase transition. The phase transition is particular to strategies where players do not severely punish a single defection. We also observe that in these particular cases, the phase transition critically depends on the number of \textit{rounds} the game is played in the thermodynamic limit. For all five games, we find that both ABM and NEM, in conjunction with game magnetization, provide crucial inputs on how cooperative behaviour can emerge in an infinite-player repeated Prisoner's dilemma game.
Multi-agent Cooperation in Diverse Population Games
We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.
Multi-agent Cooperation in Diverse Population Games
We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases to the initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.
Multi-agent Cooperation in Diverse Population Games
We consider multi-agent systems whose agents compete for resources by striving to be in the minority group. The agents adapt to the environment by reinforcement learning of the preferences of the policies they hold. Diversity of preferences of policies is introduced by adding random biases tothe initial cumulative payoffs of their policies. We explain and provide evidence that agent cooperation becomes increasingly important when diversity increases. Analyses of these mechanisms yield excellent agreement with simulations over nine decades of data.