Preference-free Alignment Learning with Regularized Relevance Reward