Goto

Collaborating Authors

 equivariance error



Flow Factorized Representation Learning-Supplementary Material-Y ue Song 1,2, Andy Keller 2, Nicu Sebe 1, and Max Welling 2

Neural Information Processing Systems

Here we omit the computation of HJ PDEs for concisity. The model is trained for 90, 000 iterations. The model is also trained for 90, 000 iterations. For the disentanglement methods, we largely enrich the original MNIST dataset by adding the transformed images of the whole sequence. The generalization ability ( i.e., validation accuracy) can be thus regarded as a reasonable surrogate for the disentanglement ability.



Approximation-Generalization Trade-offs under (Approximate) Group Equivariance

Neural Information Processing Systems

The explicit incorporation of task-specific inductive biases through symmetry has emerged as a general design precept in the development of high-performance machine learning models. For example, group equivariant neural networks have demonstrated impressive performance across various domains and applications such as protein and drug design. A prevalent intuition about such models is that the integration of relevant symmetry results in enhanced generalization. Moreover, it is posited that when the data and/or the model exhibits only approximate or partial symmetry, the optimal or best-performing model is one where the model symmetry aligns with the data symmetry. In this paper, we conduct a formal unified investigation of these intuitions. To begin, we present quantitative bounds that demonstrate how models capturing task-specific symmetries lead to improved generalization. Utilizing this quantification, we examine the more general question of dealing with approximate/partial symmetries. We establish, for a given symmetry group, a quantitative comparison between the approximate equivariance of the model and that of the data distribution, precisely connecting model equivariance error and data equivariance error. Our result delineates the conditions under which the model equivariance error is optimal, thereby yielding the best-performing model for the given task and data.


Training Dynamics of Learning 3D-Rotational Equivariance

Shen, Max W., Nowara, Ewa, Maser, Michael, Cho, Kyunghyun

arXiv.org Artificial Intelligence

While data augmentation is widely used to train symmetry-agnostic models, it remains unclear how quickly and effectively they learn to respect symmetries. We investigate this by deriving a principled measure of equivariance error that, for convex losses, calculates the percent of total loss attributable to imperfections in learned symmetry. We focus our empirical investigation to 3D-rotation equivariance on high-dimensional molecular tasks (flow matching, force field prediction, denoising voxels) and find that models reduce equivariance error quickly to $\leq$2\% held-out loss within 1k-10k training steps, a result robust to model and dataset size. This happens because learning 3D-rotational equivariance is an easier learning task, with a smoother and better-conditioned loss landscape, than the main prediction task. For 3D rotations, the loss penalty for non-equivariant models is small throughout training, so they may achieve lower test loss than equivariant models per GPU-hour unless the equivariant ``efficiency gap'' is narrowed. We also experimentally and theoretically investigate the relationships between relative equivariance error, learning gradients, and model parameters.


Regular CNN Regular CNN Ideal Down - sampling

Neural Information Processing Systems

Recent works have made progress in developing scale-equivariant convolutional neural networks, e.g., through weight-sharing and kernel resizing. However, these networks are not truly scale-equivariant in practice.



Flow Factorized Representation Learning-Supplementary Material-Y ue Song 1,2, Andy Keller 2, Nicu Sebe 1, and Max Welling 2

Neural Information Processing Systems

Here we omit the computation of HJ PDEs for concisity. The model is trained for 90, 000 iterations. The model is also trained for 90, 000 iterations. For the disentanglement methods, we largely enrich the original MNIST dataset by adding the transformed images of the whole sequence. The generalization ability ( i.e., validation accuracy) can be thus regarded as a reasonable surrogate for the disentanglement ability.