Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks