bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs