Inpainting-Guided Policy Optimization for Diffusion Large Language Models