Goto

Collaborating Authors

 lemma4


Classical and Quantum Speedups for Non-Convex Optimization via Energy Conserving Descent

arXiv.org Machine Learning

The Energy Conserving Descent (ECD) algorithm was recently proposed (De Luca & Silverstein, 2022) as a global non-convex optimization method. Unlike gradient descent, appropriately configured ECD dynamics escape strict local minima and converge to a global minimum, making it appealing for machine learning optimization. We present the first analytical study of ECD, focusing on the one-dimensional setting for this first installment. We formalize a stochastic ECD dynamics (sECD) with energy-preserving noise, as well as a quantum analog of the ECD Hamiltonian (qECD), providing the foundation for a quantum algorithm through Hamiltonian simulation. For positive double-well objectives, we compute the expected hitting time from a local to the global minimum. We prove that both sECD and qECD yield exponential speedup over respective gradient descent baselines--stochastic gradient descent and its quantization. For objectives with tall barriers, qECD achieves a further speedup over sECD.


A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Neural Information Processing Systems

Wegiveasmoothed analysis, showing that evenwhen contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data.



Generalization Bounds for Neural Networks via Approximate Description Length

Neural Information Processing Systems

Namely,thattheempirical lossofall the functions in the class is -close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.


NearNeighbor

Neural Information Processing Systems

We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the rneighborhood of a query q with almost uniform probability.


An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing Systems

Arecent lineofresearch hasshownthatgradient-based algorithms withrandom initialization can converge to the global minima of the training loss for overparameterized (i.e.,sufficiently wide)deepneuralnetworks. However,thecondition onthewidth oftheneural networktoensure theglobal convergence isvery stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)).



formalization

Neural Information Processing Systems

While this setup has enjoyed a lot of attention in the applied community, there hasn't be theoretical work that even formalizes the desired guarantees.



RecurrentSubmodularWelfareand MatroidBlockingSemi-Bandits

Neural Information Processing Systems

In this work, we extend the above direction to a combinatorial semi-bandit setting and study avariant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for afixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, only guarantees a1/2-approximation for general matroids.