A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy