Goto

Collaborating Authors

 representation cost





Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Neural Information Processing Systems

This formalizes a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps, allowing the network to learn the'right' inner dimension.


Representation Costs of Linear Neural Networks: Analysis and Design

Neural Information Processing Systems

For different parameterizations (mappings from parameters to predictors), we study the regularization cost in predictor space induced by $l_2$ regularization on the parameters (weights). We focus on linear neural networks as parameterizations of linear predictors. We identify the representation cost of certain sparse linear ConvNets and residual networks. In order to get a better understanding of how the architecture and parameterization affect the representation cost, we also study the reverse problem, identifying which regularizers on linear predictors (e.g., $l_p$ norms, group norms, the $k$-support-norm, elastic net) can be the representation cost induced by simple $l_2$ regularization, and designing the parameterizations that do so.






Model Fusion via Neuron Interpolation

Luenam, Phoomraphee, Spanopoulos, Andreas, Sant, Amit, Hofmann, Thomas, Anagnostidis, Sotiris, Singh, Sidak Pal

arXiv.org Artificial Intelligence

Model fusion aims to combine the knowledge of multiple models by creating one representative model that captures the strengths of all of its parents. However, this process is non-trivial due to differences in internal representations, which can stem from permutation invariance, random initialization, or differently distributed training data. We present a novel, neuron-centric family of model fusion algorithms designed to integrate multiple trained neural networks into a single network effectively regardless of training data distribution. Our algorithms group intermediate neurons of parent models to create target representations that the fused model approximates with its corresponding sub-network. Unlike prior approaches, our approach incorporates neuron attribution scores into the fusion process. Furthermore, our algorithms can generalize to arbitrary layer types. Experimental results on various benchmark datasets demonstrate that our algorithms consistently outperform previous fusion techniques, particularly in zero-shot and non-IID fusion scenarios.