jacobian-vector product
RandALO: Out-of-sample risk estimation in no time flat
Nobel, Parth T., LeJeune, Daniel, Candès, Emmanuel J.
Training machine learning models is an often expensive process, especially in large data settings. Not only is there significant cost in the fitting of individual models, but even more importantly, the best model must be chosen from a set of candidates parameterized by a set of "hyperparameters" indexing the models, and each of these models must be fitted and evaluated in order to make the optimal selection. As a result, model selection, also called hyperparameter tuning, tends to be the most computationally expensive part of the machine learning pipeline. In order to evaluate models, we typically need to set aside unseen "holdout" data to estimate the risk of the model on new samples from the training distribution. When we have an abundance of training samples, such as in the millions or billions, we can afford to set aside a modest holdout set of tens of thousands of examples without compromising model performance.
Scalable learning of potentials to predict time-dependent Hartree-Fock dynamics
Bhat, Harish S., Gupta, Prachi, Isborn, Christine M.
We propose a framework to learn the time-dependent Hartree-Fock (TDHF) inter-electronic potential of a molecule from its electron density dynamics. Though the entire TDHF Hamiltonian, including the inter-electronic potential, can be computed from first principles, we use this problem as a testbed to develop strategies that can be applied to learn \emph{a priori} unknown terms that arise in other methods/approaches to quantum dynamics, e.g., emerging problems such as learning exchange-correlation potentials for time-dependent density functional theory. We develop, train, and test three models of the TDHF inter-electronic potential, each parameterized by a four-index tensor of size up to $60 \times 60 \times 60 \times 60$. Two of the models preserve Hermitian symmetry, while one model preserves an eight-fold permutation symmetry that implies Hermitian symmetry. Across seven different molecular systems, we find that accounting for the deeper eight-fold symmetry leads to the best-performing model across three metrics: training efficiency, test set predictive power, and direct comparison of true and learned inter-electronic potentials. All three models, when trained on ensembles of field-free trajectories, generate accurate electron dynamics predictions even in a field-on regime that lies outside the training set. To enable our models to scale to large molecular systems, we derive expressions for Jacobian-vector products that enable iterative, matrix-free training.
JAXbind: Bind any function to JAX
Roth, Jakob, Reinecke, Martin, Edenhofer, Gordian
JAX is widely used in machine learning and scientific computing, the latter of which often relies on existing high-performance code that we would ideally like to incorporate into JAX. Reimplementing the existing code in JAX is often impractical and the existing interface in JAX for binding custom code either limits the user to a single Jacobian product or requires deep knowledge of JAX and its C++ backend for general Jacobian products. With JAXbind we drastically reduce the effort required to bind custom functions implemented in other programming languages with full support for Jacobian-vector products and vector-Jacobian products to JAX. Specifically, JAXbind provides an easy-to-use Python interface for defining custom, so-called JAX primitives. Via JAXbind, any function callable from Python can be exposed as a JAX primitive. JAXbind allows a user to interface the JAX function transformation engine with custom derivatives and batching rules, enabling all JAX transformations for the custom primitive.
An Efficient Newton Method for Extreme Similarity Learning with Nonlinear Embeddings
Yuan, Bowen, Li, Yu-Sheng, Quan, Pengrui, Lin, Chih-Jen
We study the problem of learning similarity by using nonlinear embedding models (e.g., neural networks) from all possible pairs. This problem is well-known for its difficulty of training with the extreme number of pairs. Existing optimization methods extended from stochastic gradient methods suffer from slow convergence and high complexity per pass of all possible pairs. Inspired by some recent works reporting that Newton methods are competitive for training certain types of neural networks, in this work, we novelly apply the Newton method for this problem. A prohibitive cost depending on the extreme number of pairs occurs if the Newton method is directly applied. We propose an efficient algorithm which successfully eliminates the cost. Our proposed algorithm can take advantage of second-order information and lower time complexity per pass of all possible pairs. Experiments conducted on large-scale data sets demonstrate that the proposed algorithm is more efficient than existing algorithms.