Direct Preference-based Policy Optimization without Reward Modeling