Aligning Crowd Feedback via Distributional Preference Reward Modeling