Vision Encoder-Decoder Models for AI Coaching