Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset