Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward