Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons