GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO