Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards