LIRE: listwise reward enhancement for preference alignment

Open in new window