FMimic: Foundation Models are Fine-grained Action Learners from Human Videos