Goto

Collaborating Authors

 optimality


Optimal sequential tests yield log-optimal e-processes

arXiv.org Machine Learning

It has been recently shown that e-processes are sufficient for sequential testing in the following sense: every level-$α$ sequential test can be obtained by thresholding an e-process at $1/α$. However, in the above result, neither does the test have to be asymptotically optimal (in terms of stopping times) nor does the e-process have to be asymptotically log-optimal. It has separately been shown that asymptotically log-optimal e-processes yield asymptotically optimal sequential tests. In this paper, we prove the converse, arguably completing the story: it is possible to aggregate asymptotically optimal sequential tests into asymptotically log-optimal e-processes. This is accomplished by using a new class of WAIT e-processes: those that are Weighted Aggregates of Indicators of stopping Times that begin at zero, are nondecreasing and increase to infinity under the alternative at the optimal rate. Importantly, the paper discusses several nuances in the varied definitions of asymptotic (log-)optimality.


02bf86214e264535e3412283e817deaa-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their insightful feedback, and we appreciate the opportunity to improve our paper. We will1 address typos and notational inconsistencies in the updated version.2 Response to Reviewer 1:3 We would like to emphasize that Theorem 1 is the most important contribution of our paper due to its generality.4 By considering the set of all possible classifiers, it provides lower bounds on adversarial robustness for any pair of5 class-conditional distributions. As we show in our experimental results in Section 6, we are able to obtain lower bounds6 for arbitrary real-world datasets by constructing the empirical distribution for these. In our estimation, these results7 serve to provide theoretical validation for adversarial training for low perturbation budgets as well as to highlight the8 gap to optimality for higher budgets.9




Instance-optimal Mean Estimation Under Differential Privacy

Neural Information Processing Systems

Mean estimation under differential privacy is a fundamental problem, but worstcase optimal mechanisms do not offer meaningful utility guarantees in practice when the global sensitivity is very large. Instead, various heuristics have been proposed to reduce the error on real-world data that do not resemble the worst-case instance. This paper takes a principled approach, yielding a mechanism that is instance-optimal in a strong sense. In addition to its theoretical optimality, the mechanism is also simple and practical, and adapts to a variety of data characteristics without the need of parameter tuning. It easily extends to the local and shuffle model as well.


related work

Neural Information Processing Systems

Deterministic RLDeterministic system is often the starting case in the study of sample-efficient algorithms, where the issue of exploration and exploitation trade-off is more clearly revealed since both the transition kernel and reward function are deterministic. The seminal work [81] proposes a sample-efficient algorithm for Q-learning that works for a family of function classes. Recently, [21] studies the agnostic setting where the optimal Q-function can only be approximated by a function class with approximation error. The algorithm in [21] learns the optimal policy with the number of trajectories linear with the eluder dimension. Consider MDPM where the transition is deterministic. Assume the function class in Definition 3.1 satisfies Assumption 2.1 and Assumption 2.2. For any t (0,1), if d Ω(log(BW/λ))and n d poly(κ,k,λ,BW,Bϕ,H,log(d/t)), then with probability at least 1 tAlgorithm 1 returns the optimal policy π .


Supplementary Material: Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

Neural Information Processing Systems

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the nondistributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to eΘ(1)when the local datasets heterogeneity goes to zero.



Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond 1+α Moments

Neural Information Processing Systems

There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the fundamental limits of what we can extract from limited and valuable data. The state of the art results for mean estimation in R are 1) the optimal sub-Gaussian mean estimator by [Lee and Valiant, 2022], attaining the optimal sub-Gaussian error constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [Bubeck, Cesa-Bianchi and Lugosi, 2013] and a matching lower bound by [Devroye, Lerasle, Lugosi, and Oliveira, 2016], characterizing the big-O optimal errors for distributions that have tails heavy enough that only a 1 + α moment exists for some α (0,1). Both of these results, however, are optimal only in the worst case. Motivated by the recent effort in the community to go "beyond the worst-case analysis" of algorithms, we initiate the fine-grained study of the mean estimation problem: Is it possible for algorithms to leverage beneficial features/quirks of their input distribution to beat the sub-Gaussian rate, without explicit knowledge of these features? We resolve this question, finding an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". Given a distribution p, assuming only that it has a finite mean and absent any additional assumptions, we show how to construct a distribution qn,δ such that the means of p and q are well-separated, yet p and q are impossible to distinguish with n samples with probability 1 δ, and q further preserves the finiteness of moments of p.


Optimality and Stability in Federated Learning: AGame-theoretic Approach

Neural Information Processing Systems

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players).