Stepwise Alignment for Constrained Language Model Policy Optimization Akifumi Wachi Thien Q. Tran Rei Sato Takumi Tanabe Y ouhei Akimoto L Y Corporation University of Tsukuba