LLMs can learn self-restraint through iterative self-reflection