Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction