Weight subcloning: direct initialization of transformers using larger pretrained ones

Open in new window