Goto

Collaborating Authors

 Statistical Learning



Differentiable Quality Diversity

Neural Information Processing Systems

Quality diversity (QD) is a growing branch of stochastic optimization research that studies the problem of generating an archive of solutions that maximize a given objective function but are also diverse with respect to a set of specified measure functions. However, even when these functions are differentiable, QD algorithms treat them as "black boxes", ignoring gradient information. We present the differentiable quality diversity (DQD) problem, a special case of QD, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via a Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-ofthe-art QD algorithms, highlighting DQD's promise for efficient quality diversity optimization when gradient information is available. Source code is available at https://github.com/icaros-usc/dqd.


Sparsity-Preserving Differentially Private Training of Large Embedding Models

Neural Information Processing Systems

As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGDnaively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions (106) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.



Neural Hybrid Automata Supplementary Material

Neural Information Processing Systems

A.1 Neural Hybrid Automata: Modules and Hyperparameters We provide a notation and summary table for Neural Hybrid Automata (NHA). The table serves as a quick reference for the core concepts introduced in the main text. Labels every subjtrajectory Xi with a mode z to ensure mode-conditioned decoder Fz can reconstruct it despite Neural ODE representation limitations (uniqueness of solutions given an initial condition). The only NHA hyperparameter beyond module architectural choices is m, or number of latent modes provided to the model at initialization. Performance effects of changing mhave been explored in Section 5.2 and Appendix B.2. Appendix B.2 further provides analyzes potential techniques to prune additional modes. A.2 Gradient Pathologies We provide some theoretical insights on the phenomenon of gradient pathologies with the simple example of a one-dimensional linear hybrid system with two modes and one timed jump, xt = axtt<ฯ„ bxtt>= ฯ„ t 6= ฯ„ x+t = cxtt= ฯ„ (A.1)



Practical Near Neighbor Search via Group Testing

Neural Information Processing Systems

We present a new algorithm for the approximate near neighbor problem that combines classical ideas from group testing with locality-sensitive hashing (LSH). We reduce the near neighbor search problem to a group testing problem by designating neighbors as "positives," non-neighbors as "negatives," and approximate membership queries as group tests.




Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Neural Information Processing Systems

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings - single-index models and functions of quadratic features - and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn nonlinear features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.