GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

Open in new window