Goto

Collaborating Authors

 algorithm


Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing Systems

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice.


elaborate on the algorithm description accordingly

Neural Information Processing Systems

We thank all reviewers for their valuable feedback and comments. Please find our responses below. Reviewer 1 - Explanation in the introduction: we strive for clarity and we appreciate this comment. We thank the reviewer for pointing this out. This can be done in many ways as discussed in Appendix C. The theoretical value used for the bounds is rather conservative however.



Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Neural Information Processing Systems

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using any constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.


Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

Neural Information Processing Systems

Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong contamination model, where a constant fraction of datapoints are arbitrarily corrupted.


Appendix A More related works

Neural Information Processing Systems

Besides proportionality, in another parallel line of research, envy-freeness and its relaxations, namely envy-free up to one item (EF1) and envy-free up to any item (EFX), are also widely studied. It was shown in [35] and [11] for goods and chores, respectively, that an EF1 allocation exists for the monotone combinatorial functions. However, the existence of EFX allocations is still unknown even with additive functions. Therefore, approximation algorithms were proposed in [2, 42] for additive functions and in [39, 16] for subadditive functions. We refer the readers to [3] for a detailed survey on fair allocation of indivisible items.



Provably Safe Reinforcement Learning with Step-wise Violation Constraints Institute for Interdisciplinary Information Sciences, Tsinghua University

Neural Information Processing Systems

We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.


Provably Safe Reinforcement Learning with Step-wise Violation Constraints Institute for Interdisciplinary Information Sciences, Tsinghua University

Neural Information Processing Systems

We investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we focus on stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications that need to ensure safety in all decision steps but may not always possess safe actions, e.g., robot control and autonomous driving.


Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms

Neural Information Processing Systems

We study and provide instance-optimal algorithms in differential privacy by extending and approximating the inverse sensitivity mechanism. We provide two approximation frameworks, one which only requires knowledge of local sensitivities, and a gradient-based approximation for optimization problems, which are efficiently computable for a broad class of functions. We complement our analysis with instance-specific lower bounds for vector-valued functions, which demonstrate that our mechanisms are (nearly) instance-optimal under certain assumptions and that minimax lower bounds may not provide an accurate estimate of the hardness of a problem in general: our algorithms can significantly outperform minimax bounds for well behaved instances. Finally, we use our approximation framework to develop private mechanisms for unbounded-range mean estimation, principal component analysis, and linear regression. For PCA, our mechanisms give an efficient (pure) differentially private algorithm with near-optimal rates.