Lawrence, Hannah
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Lawrence, Hannah, Portilheiro, Vasco, Zhang, Yan, Kaba, Sékou-Oumar
Equivariance encodes known symmetries into neural networks, often enhancing generalization. However, equivariant networks cannot break symmetries: the output of an equivariant network must, by definition, have at least the same self-symmetries as the input. This poses an important problem, both (1) for prediction tasks on domains where self-symmetries are common, and (2) for generative models, which must break symmetries in order to reconstruct from highly symmetric latent spaces. This fundamental limitation can be addressed by considering equivariant conditional distributions, instead of equivariant functions. We present novel theoretical results that establish necessary and sufficient conditions for representing such distributions. Concretely, this representation provides a practical framework for breaking symmetries in any equivariant network via randomized canonicalization. Our method, SymPE (Symmetry-breaking Positional Encodings), admits a simple interpretation in terms of positional encodings. This approach expands the representational power of equivariant networks while retaining the inductive bias of symmetry, which we justify through generalization bounds. Experimental results demonstrate that SymPE significantly improves performance of group-equivariant and graph neural networks across diffusion models for graphs, graph autoencoders, and lattice spin system modeling.
Equivariant Frames and the Impossibility of Continuous Canonicalization
Dym, Nadav, Lawrence, Hannah, Siegel, Jonathan W.
Canonicalization provides an architecture-agnostic method for enforcing equivariance, with generalizations such as frame-averaging recently gaining prominence as a lightweight and flexible alternative to equivariant architectures. Recent works have found an empirical benefit to using probabilistic frames instead, which learn weighted distributions over group elements. In this work, we provide strong theoretical justification for this phenomenon: for commonly-used groups, there is no efficiently computable choice of frame that preserves continuity of the function being averaged. In other words, unweighted frame-averaging can turn a smooth, non-symmetric function into a discontinuous, symmetric function. To address this fundamental robustness problem, we formally define and construct \emph{weighted} frames, which provably preserve continuity, and demonstrate their utility by constructing efficient and continuous weighted frames for the actions of $SO(2)$, $SO(3)$, and $S_n$ on point clouds.
Learning Polynomial Problems with $SL(2,\mathbb{R})$ Equivariance
Lawrence, Hannah, Harris, Mitchell Tong
Optimizing and certifying the positivity of polynomials are fundamental primitives across mathematics and engineering applications, from dynamical systems to operations research. However, solving these problems in practice requires large semidefinite programs, with poor scaling in dimension and degree. In this work, we demonstrate for the first time that neural networks can effectively solve such problems in a data-driven fashion, achieving tenfold speedups while retaining high accuracy. Moreover, we observe that these polynomial learning problems are equivariant to the non-compact group $SL(2,\mathbb{R})$, which consists of area-preserving linear transformations. We therefore adapt our learning pipelines to accommodate this structure, including data augmentation, a new $SL(2,\mathbb{R})$-equivariant architecture, and an architecture equivariant with respect to its maximal compact subgroup, $SO(2, \mathbb{R})$. Surprisingly, the most successful approaches in practice do not enforce equivariance to the entire group, which we prove arises from an unusual lack of architecture universality for $SL(2,\mathbb{R})$ in particular. A consequence of this result, which is of independent interest, is that there exists an equivariant function for which there is no sequence of equivariant polynomials multiplied by arbitrary invariants that approximates the original function. This is a rare example of a symmetric problem where data augmentation outperforms a fully equivariant architecture, and provides interesting lessons in both theory and practice for other problems with non-compact symmetries.
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Zhang, Xuan, Wang, Limei, Helwig, Jacob, Luo, Youzhi, Fu, Cong, Xie, Yaochen, Liu, Meng, Lin, Yuchao, Xu, Zhao, Yan, Keqiang, Adams, Keir, Weiler, Maurice, Li, Xiner, Fu, Tianfan, Wang, Yucheng, Yu, Haiyang, Xie, YuQing, Fu, Xiang, Strasser, Alex, Xu, Shenglong, Liu, Yi, Du, Yuanqi, Saxton, Alexandra, Ling, Hongyi, Lawrence, Hannah, Stärk, Hannes, Gui, Shurui, Edwards, Carl, Gao, Nicholas, Ladera, Adriana, Wu, Tailin, Hofgard, Elyssa F., Tehrani, Aria Mansouri, Wang, Rui, Daigavane, Ameya, Bohde, Montgomery, Kurtin, Jerry, Huang, Qian, Phung, Tuong, Xu, Minkai, Joshi, Chaitanya K., Mathis, Simon V., Azizzadenesheli, Kamyar, Fang, Ada, Aspuru-Guzik, Alán, Bekkers, Erik, Bronstein, Michael, Zitnik, Marinka, Anandkumar, Anima, Ermon, Stefano, Liò, Pietro, Yu, Rose, Günnemann, Stephan, Leskovec, Jure, Ji, Heng, Sun, Jimeng, Barzilay, Regina, Jaakkola, Tommi, Coley, Connor W., Qian, Xiaoning, Qian, Xiaofeng, Smidt, Tess, Ji, Shuiwang
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.
Self-Supervised Learning with Lie Symmetries for Partial Differential Equations
Mialon, Grégoire, Garrido, Quentin, Lawrence, Hannah, Rehman, Danyal, LeCun, Yann, Kiani, Bobak T.
Dynamical systems governed by differential equations are ubiquitous in fluid dynamics, chemistry, astrophysics, and beyond. Accurately analyzing and predicting the evolution of such systems is of paramount importance, inspiring decades of innovation in algorithms for numerical methods. However, high-accuracy solvers are often computationally expensive. Machine learning has recently arisen as an alternative method for analyzing differential equations at a fraction of the cost [1, 2, 3]. Typically, the neural network for a given equation is trained on simulations of that same equation, generated by numerical solvers that are high-accuracy but comparatively slow [4]. What if we instead wish to learn from heterogeneous data, e.g., data with missing information, or gathered from actual observation of varied physical systems rather than clean simulations? For example, we may have access to a dataset of instances of time-evolution, stemming from a family of partial differential equations (PDEs) for which important characteristics of the problem, such as viscosity or initial conditions, vary and/or are unknown. In this case, representations learned from such a large, "unlabeled" dataset could still prove useful in learning to identify unknown characteristics, given only a small dataset "labeled" with viscosities or reaction constants. Alternatively, the "unlabeled" dataset may contain evolutions over very short periods of time, or with missing time intervals; possible goals are then to learn representations that could be useful in filling in these gaps, or regressing other quantities of interest.
Distilling Model Failures as Directions in Latent Space
Jain, Saachi, Lawrence, Hannah, Moitra, Ankur, Madry, Aleksander
The composition of the training dataset has key implications for machine learning models' behavior [Fel19; CLK+19; KL17; GZ19; IPE+22], especially as the training environments often deviate from deployment conditions [RGL19; KSM+20; HBM+20]. For example, a model might struggle on specific subpopulations in the data if that subpopulation was mislabeled [NAM21; SC18; BHK+20; VCG+22], underrepresented [SKH+20; STM21], or corrupted [HD19; HBM+20]. More broadly, the training dataset might contain spurious correlations, encouraging the model to depend on prediction rules that do not generalize to deployment [XEI+20; GJM+20; DJL21]. Moreover, identifying meaningful subpopulations within data allows for dataset refinement (such as filtering or relabeling) [YQF+19; SC18], and training more fair [KGZ19; DYZ+21] or accurate [JFK+20; SHL20] models. However, dominant approaches to such identification of biases and difficult subpopulations within datasets often require human intervention, which is typically labor intensive and thus not conducive to routine usage.
Implicit Bias of Linear Equivariant Networks
Lawrence, Hannah, Georgiev, Kristian, Dienes, Andrew, Kiani, Bobak T.
Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of scientific and technical applications by explicitly encoding group symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by the explicit symmetry bias of their convolutional architecture, a recent line of work has proposed that the implicit bias of training algorithms on a particular parameterization (or architecture) is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent in a binary classification task converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative symmetry groups (such as permutations). We validate our theorems via experiments on a variety of groups and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.
Phase Retrieval with Holography and Untrained Priors: Tackling the Challenges of Low-Photon Nanoscale Imaging
Lawrence, Hannah, Barmherzig, David A., Li, Henry, Eickenberg, Michael, Gabrié, Marylou
Phase retrieval is the inverse problem of recovering a signal from magnitude-only Fourier measurements, and underlies numerous imaging modalities, such as Coherent Diffraction Imaging (CDI). A variant of this setup, known as holography, includes a reference object that is placed adjacent to the specimen of interest before measurements are collected. The resulting inverse problem, known as holographic phase retrieval, is well-known to have improved problem conditioning relative to the original. This innovation, i.e. Holographic CDI, becomes crucial at the nanoscale, where imaging specimens such as viruses, proteins, and crystals require low-photon measurements. This data is highly corrupted by Poisson shot noise, and often lacks low-frequency content as well. In this work, we introduce a dataset-free deep learning framework for holographic phase retrieval adapted to these challenges. The key ingredients of our approach are the explicit and flexible incorporation of the physical forward model into an automatic differentiation procedure, the Poisson log-likelihood objective function, and an optional untrained deep image prior. We perform extensive evaluation under realistic conditions. Compared to competing classical methods, our method recovers signal from higher noise levels and is more resilient to suboptimal reference design, as well as to large missing regions of low frequencies in the observations. To the best of our knowledge, this is the first work to consider a dataset-free machine learning approach for holographic phase retrieval.
Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
Chen, Lin, Yu, Qian, Lawrence, Hannah, Karbasi, Amin
We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. We here show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. The minimax analysis in one dimension is more involved. To establish the one-dimensional result, we introduce the fugal game relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the minimax lower bound in one dimension. We next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore we conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes. In the case of bandit feedback, we first determine a novel linear (in $T$) minimax regret for bandit linear optimization against the strongly adaptive adversary of OCO, implying that a slightly weaker adversary is appropriate. We also establish the minimax regret of switching-constrained bandit convex optimization in dimension $n>2$ to be $\tilde{\Theta}(\frac{T}{\sqrt{K}})$.