Model Fusion via Optimal Transport
Singh, Sidak Pal, Jaggi, Martin
Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion procedure for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We discuss two main algorithms for fusing neural networks in this "one-shot" manner, without requiring any retraining. Finally, we illustrate on CIFAR10 and MNIST how this significantly outperforms vanilla averaging on convolutional networks, such as VGG11 and multi-layer perceptrons, and for transfer tasks even surpasses the performance of both original models.
Oct-12-2019
- Country:
- Europe > Russia (0.04)
- North America > United States
- New York > New York County
- New York City (0.04)
- California > San Francisco County
- San Francisco (0.14)
- New York > New York County
- Asia
- Genre:
- Research Report (1.00)
- Technology: