Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Open in new window