Graph Metanetworks for Processing Diverse Neural Architectures
Lim, Derek, Maron, Haggai, Law, Marc T., Lorraine, Jonathan, Lucas, James
Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks -- neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. Neural networks are well-established for predicting, generating, and transforming data. A newer paradigm is to treat the parameters of neural networks themselves as data. This insight inspired researchers to suggest neural architectures that can predict properties of trained neural networks (Eilertsen et al., 2020), generate new networks (Erkoç et al., 2023), optimize networks (Metz et al., 2022), or otherwise transform them (Navon et al., 2023; Zhou et al., 2023a). We refer to these neural networks that process other neural networks as metanetworks, or metanets for short. Metanets enable new applications, but designing them is nontrivial. A common approach is to flatten the network parameters into a vector representation, neglecting the input network structure. More generally, a prominent challenge in metanet design is that the space of neural network parameters exhibits symmetries. For example, permuting the neurons in the hidden layers of a Multilayer Perceptron (MLP) leaves the network output unchanged (Hecht-Nielsen, 1990). Instead, equivariant metanets respect these symmetries, so that if the input network is permuted then the metanet output is permuted in the same way. Recently, several works have proposed equivariant metanets that have shown significantly improved performance (Navon et al., 2023; Zhou et al., 2023a;b). However, these networks typically require highly specialized, hand-designed layers that can be difficult to devise.
Dec-29-2023