TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

Open in new window