Goto

Collaborating Authors

 calculation


It's the Great Fear of Our Time. I'm Mathematically Sure It Won't Happen.

Slate

The individual pieces create a kind of illusion. When a horse trots, is there a moment when its four feet are in the air simultaneously? In the 1870s, Leland Stanford, the railroad magnate and benefactor of the university that bears his name, funded an effort to find out. The answer shocked many equestrian experts and artists: The horse's feet leave the ground together, but not when outstretched as commonly depicted in paintings and carousels; the feet do so when they reach inward, toward the horse's belly. Surprisingly, this discovery about a horse's gait sheds light on a much more modern debate--whether A.I. is on a path to consciousness.


Minimax optimal submatrix detection: Sharp non-asymptotic rates

arXiv.org Machine Learning

Given an observation $\mathbf Y \in \mathbb{R}^{d_1\times d_2}$ from the model $\mathbf Y = \mathbf X + \mathbf E$ where $\mathbf X$ is constant and $\mathbf E$ has i.i.d. $N(0,1)$ entries, we consider the problem of detecting a planted submatrix in the mean matrix $\mathbf X$. Specifically, we aim to distinguish the null hypothesis $\mathbf X = 0$ from the alternative hypothesis in which $\mathbf X$ is non-zero only on a submatrix of size $s_1 \times s_2$ with elevated entries bounded below by $ฮผ>0$. We establish a minimax lower bound characterizing how large $ฮผ$ must be to ensure that the two hypotheses are distinguishable with high probability. Furthermore, we derive novel minimax-optimal tests achieving the lower bound, and describe extensions of these tests that are adaptive to unknown sparsity levels $s_1$ and $s_2$. In contrast with previous work, which required restrictive assumptions on $s_1,s_2, d_1$ and $d_2$, our non-asymptotic upper and lower bounds match for any configuration of these parameters.


Information-Theoretic Generalization Bounds for Sequential Decision Making

arXiv.org Machine Learning

Information-theoretic generalization bounds based on the supersample construction are a central tool for algorithm-dependent generalization analysis in the batch i.i.d.~setting. However, existing supersample conditional mutual information (CMI) bounds do not directly apply to sequential decision-making problems such as online learning, streaming active learning, and bandits, where data are revealed adaptively and the learner evolves along a causal trajectory. To address this limitation, we develop a sequential supersample framework that separates the learner filtration from a proof-side enlargement used for ghost-coordinate comparisons. Under a row-wise exchangeability assumption, the sequential generalization gap is controlled by sequential CMI, a sum of roundwise selector--loss information terms. We also establish a Bernstein-type refinement that yields faster rates under suitable variance conditions. The selector-SCMI proof strategy applies to online learning, streaming active learning with importance weighting, and stochastic multi-armed bandits.


02bf86214e264535e3412283e817deaa-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their insightful feedback, and we appreciate the opportunity to improve our paper. We will1 address typos and notational inconsistencies in the updated version.2 Response to Reviewer 1:3 We would like to emphasize that Theorem 1 is the most important contribution of our paper due to its generality.4 By considering the set of all possible classifiers, it provides lower bounds on adversarial robustness for any pair of5 class-conditional distributions. As we show in our experimental results in Section 6, we are able to obtain lower bounds6 for arbitrary real-world datasets by constructing the empirical distribution for these. In our estimation, these results7 serve to provide theoretical validation for adversarial training for low perturbation budgets as well as to highlight the8 gap to optimality for higher budgets.9



Enhancing molecular dynamics with equivariant machine-learned densities

arXiv.org Machine Learning

Machine-learning interatomic potentials (MLIPs) have enabled molecular dynamics at near ab initio accuracy, yet remain limited to energies and forces by construction, leaving electronic observables such as dipole moments and polarizabilities inaccessible. We introduce DenSNet, a density-first approach to machine-learned electronic structure that learns the Hohenberg--Kohn map from nuclear configurations to the ground-state electron density. Our approach employs an SE(3)-equivariant neural network to predict density coefficients of a flexible atom-centered Gaussian basis, combined with a $ฮ”$-learning strategy that uses superposed atomic densities as a prior to accelerate training. A second equivariant network then maps the predicted density to the total energy, providing a unified framework for molecular dynamics and electronic structure. We validate DenSNet on ethanol, ethanethiol, and resorcinol, where infrared spectra from machine-learned trajectories show excellent agreement with experimental gas-phase measurements. To test scalability, we train on polythiophene oligomers with 1--6 monomers and extrapolate to chains of up to 12 monomers, generating stable long-time trajectories whose infrared spectra agree with reference density functional theory calculations. Here, we show that reinstating the electron density as the central learned quantity opens a practical route to transferable prediction of spectroscopic and electronic observables in large-scale molecular simulations.




Efficient Active Learning for Gaussian Process Classification by Error Reduction

Neural Information Processing Systems

Active learning sequentially selects the best instance for labeling by optimizing an acquisition function to enhance data/label efficiency. The selection can be either from a discrete instance set (pool-based scenario) or a continuous instance space (query synthesis scenario). In this work, we study both active learning scenarios for Gaussian Process Classification (GPC). The existing active learning strategies that maximize the Estimated Error Reduction (EER) aim at reducing the classification error after training with the new acquired instance in a onestep-look-ahead manner. The computation of EER-based acquisition functions is typically prohibitive as it requires retraining the GPC with every new query.