Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training