One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow