Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions