Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints

Open in new window