Merging Models with Fisher-Weighted Averaging

Neural Information Processing Systems 

Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities.