Plotting

 Dimitrakakis, Christos


Achieving Privacy in the Adversarial Multi-Armed Bandit

AAAI Conferences

In this paper, we improve the previously best known regret ย bound to achieve ฮต-differential privacy in oblivious adversarial ย bandits from O(T 2/3 /ฮต) to O(โˆšT lnT/ฮต). This is achieved ย by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear ย amount of information in T. However, we can improve this ย privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O(โˆš ln T)-DP, with a a regret of O(T 2/3 ) that holds against an adaptive adversary, an improvement from the best known of O(T 3/4 ). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.


Thompson Sampling for Stochastic Bandits with Graph Feedback

AAAI Conferences

We present a simple set of algorithms based on Thompson Sampling for stochastic bandit problems with graph feedback. Thompson Sampling is generally applicable, without the need to construct complicated upper confidence bounds. As we show in this paper, it has excellent performance in problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, as well as extensive experi- mental results on real and simulated networks. More specifically, we tested our algorithms on power law, planted partitions and Erdo'sโ€“Rรฉnyi graphs, as well as on graphs derived from Facebook and Flixster data and show that they clearly outperform related methods that employ upper confidence bounds.


Bayesian Differential Privacy through Posterior Sampling

arXiv.org Machine Learning

Differential privacy formalises privacy-preserving mechanisms that provide access to a database. We pose the question of whether Bayesian inference itself can be used directly to provide private access to data, with no modification. The answer is affirmative: under certain conditions on the prior, sampling from the posterior distribution can be used to achieve a desired level of privacy and utility. To do so, we generalise differential privacy to arbitrary dataset metrics, outcome spaces and distribution families. This allows us to also deal with non-i.i.d or non-tabular datasets. We prove bounds on the sensitivity of the posterior to the data, which gives a measure of robustness. We also show how to use posterior sampling to provide differentially private responses to queries, within a decision-theoretic framework. Finally, we provide bounds on the utility and on the distinguishability of datasets. The latter are complemented by a novel use of Le Cam's method to obtain lower bounds. All our general results hold for arbitrary database metrics, including those for the common definition of differential privacy. For specific choices of the metric, we give a number of examples satisfying our assumptions.


Algorithms for Differentially Private Multi-Armed Bandits

AAAI Conferences

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ฮต,ฮด) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ฮตโˆ’1 + log T ). This is a significant improvement over previous results, which only achieve poly-log regret O(ฮตโˆ’2 log3 T), because of our use of a novel interval based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.


On the Differential Privacy of Bayesian Inference

AAAI Conferences

The latter achieves While B wants to learn as much as possible from the data, stealth through consistent posterior updates. For general she doesn't want A to learn about any individual datum. Bayesian networks, posteriors may be nonparametric. In This is for example the case where A is an insurance agency, this case, we explore a mechanism (Dimitrakakis et al. 2014) the data are medical records, and B wants to convey the efficacy which samples from the posterior to answer queries--no additional of drugs to the agency, without revealing the specific noise is injected. We complement our study with illnesses of individuals in the population. Such requirements a maximum a posteriori estimator that leverages the exponential of privacy are of growing interest in the learning (Chaudhuri mechanism (McSherry and Talwar 2007). Our utility and Hsu 2012; Duchi, Jordan, and Wainwright 2013), theoretical and privacy bounds connect privacy and graph/dependency computer science (Dwork and Smith 2009; McSherry structure, and are complemented by illustrative experiments and Talwar 2007) and databases communities (Barak et al. with Bayesian naรฏve Bayes and linear regression.


On the Differential Privacy of Bayesian Inference

arXiv.org Machine Learning

We study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy. Our main contributions are four different algorithms for private Bayesian inference on proba-bilistic graphical models. These include two mechanisms for adding noise to the Bayesian updates, either directly to the posterior parameters, or to their Fourier transform so as to preserve update consistency. We also utilise a recently introduced posterior sampling mechanism, for which we prove bounds for the specific but general case of discrete Bayesian networks; and we introduce a maximum-a-posteriori private mechanism. Our analysis includes utility and privacy bounds, with a novel focus on the influence of graph structure on privacy. Worked examples and experiments with Bayesian na{\"i}ve Bayes and Bayesian linear regression illustrate the application of our mechanisms.


Algorithms for Differentially Private Multi-Armed Bandits

arXiv.org Machine Learning

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(\epsilon, \delta)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O(\epsilon^{-1} + \log T)$. This is a significant improvement over previous results, which only achieve poly-log regret $O(\epsilon^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.


Generalised Entropy MDPs and Minimax Regret

arXiv.org Machine Learning

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.


The Reinforcement Learning Competition 2014

AI Magazine

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.


The Reinforcement Learning Competition 2014

AI Magazine

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.