Learning to Reason in LLMs by Expectation Maximization