On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding