Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design Anonymous Author(s) Affiliation Address email Scaling laws have been recently employed to derive compute-optimal model size

Neural Information Processing Systems 

However, the simple power-law relation becomes more complicated when compute is considered.