log log
Entropy testing and its application to testing Bayesian networks
This paper studies the problem of entropy identity testing: given sample access to a distribution p and a fully described distribution q (both discrete distributions over a domain of size k), and the promise that either p = q or |H (p) H (q)| ε, where H () denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability.
Simultaneous Approximation of the Score Function and Its Derivatives by Deep Neural Networks
Yakovlev, Konstantin, Puchkin, Nikita
Score estimation, the task of learning the gradient of the log density, has become a crucial part of generative diffusion models [Song and Ermon, 2019, Song et al., 2021]. These models achieve state-of-the-art performance in a wide range of domains including images, audio and video synthesis [Dhariwal and Nichol, 2021, Kong et al., 2021, Ho et al., 2022]. To sample from the desired distribution, one needs to have an accurate score function estimator along the Ornstein-Uhlenbeck process. In the context of diffusion models the score estimation is done through the minimization of denoising score matching loss function over the class of neural networks [Song et al., 2021, Vincent, 2011, Oko et al., 2023]. Another recipe for score estimation is implicit score matching proposed by Hyv arinen [2005]. The proposed objective includes not only the score function, but also its Jacobian trace. A crucial research question is to determine the iteration complexity of the distribution estimation given inaccurate score function. The convergence theory of diffusion models has received much attention in the recent years. Some works [De Bortoli, 2022, Chen et al., 2023b, Benton et al., 2024, Li and Y an, 2024] study SDE-based samplers under the assumption that the score estimator is L
Batched Thompson Sampling
We introduce a novel anytime batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches. We show that this policy simultaneously achieves a problem dependent regret of order $O(\log(T))$ and a minimax regret of order $O(\sqrt{T\log(T)})$ while the number of batches can be bounded by $O(\log(T))$ independent of the problem instance over a time horizon $T$. We also prove that in expectation the instance dependent batch complexity of our policy is of order $O(\log\log(T))$. These results indicate that Thompson sampling performs competitively with recently proposed algorithms for the batched setting, which optimize the batch structure for a given time horizon $T$ and prioritize exploration in the beginning of the experiment to eliminate suboptimal actions. Unlike these algorithms, the batched Thompson sampling algorithm we propose is an anytime policy, i.e. it operates without the knowledge of the time horizon $T$, and as such it is the only anytime algorithm that achieves optimal regret with $O(\log\log(T))$ expected batch complexity. This is achieved through a dynamic batching strategy, which uses the agents estimates to adaptively increase the batch duration.
Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences
Aolaritei, Liviu, Jordan, Michael I.
We study stopping rules for stochastic gradient descent (SGD) for convex optimization from the perspective of anytime-valid confidence sequences. Classical analyses of SGD provide convergence guarantees in expectation or at a fixed horizon, but offer no statistically valid way to assess, at an arbitrary time, how close the current iterate is to the optimum. We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD, constructed via nonnegative supermartingales and requiring no smoothness or strong convexity. This confidence sequence yields a simple stopping rule that is provably $\varepsilon$-optimal with probability at least $1-α$, with explicit bounds on the stopping time under standard stochastic approximation stepsizes. To the best of our knowledge, these are the first rigorous, time-uniform performance guarantees and finite-time $\varepsilon$-optimality certificates for projected SGD with general convex objectives, based solely on observable trajectory quantities.