Supervised Fine-tuning in turn Improves Visual Foundation Models