Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking