Autoregressive Bandits
Bacchiocchi, Francesco, Genalti, Gianmarco, Maran, Davide, Mussi, Marco, Restelli, Marcello, Gatti, Nicola, Metelli, Alberto Maria
–arXiv.org Artificial Intelligence
In this paper, we model the reward of Autoregressive processes naturally arise in a a sequential decision-making problem as an AR process large variety of real-world scenarios, including where its parameters depend on the action selected by the e.g., stock markets, sell forecasting, weather agent at every round. This scenario can be regarded as an prediction, advertising, and pricing. When extension of the multi-armed bandit (MAB, Lattimore & addressing a sequential decision-making problem Szepesvári, 2020) problem, in which an AR process governs in such a context, the temporal dependence the temporal structure of the observed rewards that between consecutive observations should be is, through the action-dependent AR parameters, that are properly accounted for converge to the optimal unknown to the agent. It is worth mentioning that such decision policy. In this work, we propose a novel a scenario displays notable differences compared to more online learning setting, named Autoregressive traditional non-stationary MABs (Gur et al., 2014). Indeed, Bandits (ARBs), in which the observed reward in the presented scenario, we can exploit the knowledge that follows an autoregressive process of order k, the underlying process is AR and, more importantly, that whose parameters depend on the action the such a dynamic depends on the agent's action.
arXiv.org Artificial Intelligence
Dec-12-2022