ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation School of Computer Science, The University of Sydney, Australia