The Safety-Privacy Tradeoff in Linear Bandits

Zibaie, Arghavan, Hutchinson, Spencer, Pedarsani, Ramtin, Alizadeh, Mahnoosh

arXiv.org Artificial Intelligence 

Arghavan Zibaie and Spencer Hutchinson and Ramtin Pedarsani and Mahnoosh Alizadeh University of California Santa Barbara {zibaie,shutchinson,ramtin,alizadeh}@ucsb.edu Abstract --We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget. I. INTRODUCTION The stochastic linear bandit problem constitutes a sequential decision-making problem wherein, at each round, we observe a response to chosen actions which is a perturbed linear function of the action parameterized by an unknown parameter vector. When applying this tool to safety-critical applications such as health care [1], power systems and transportation [2], [3], the decision-making tasks must operate within certain safety constraints that depend on the unknown bandit parameters, and violations of these constraints can result in adverse events. For example, when sequentially learning how to price electricity to manage the demand of users whose price response is unknown, the price setting entity must ensure that the resulting demands do not violate electric distribution system constraints from day one, in spite of its uncertainty about user responses. As a result, variants of the linear stochastic bandit problem with constraints have been studied in the literature, e.g., [4]-[6]. In many such safety-critical applications, however, a central challenge lies in the fact that users might consider their responses to the central coordinator's interventions to be private information. This can further complicate the task of learning optimal interventions while keeping the system safe at all rounds of the learning process. To formalize this challenge mathematically, we consider a collection of linear stochastic bandit problems whose responses are tied together through a global safety constraint, coupling what actions can be safely played on each bandit.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found