Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
Cundy, Chris, Desai, Rishi, Ermon, Stefano
–arXiv.org Artificial Intelligence
As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.
arXiv.org Artificial Intelligence
Apr-16-2024
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- Arizona (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Europe
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: