Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment