Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation

Open in new window