Fostering Video Reasoning via Next-Event Prediction