Q-Probe: A Lightweight Approach to Reward Maximization for Language Models