Reward Dropout Improves Control: Bi-objective Perspective on Reinforced LM