OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning