Fleximo: Towards Flexible Text-to-Human Motion Video Generation