Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits - Appendix
–Neural Information Processing Systems
A.1 Detailed Re-alignment T ask Formulation and Training Setup In Figure A1, we show the procedure for converting the data samples in the alignment datasets into training data of AEM (negative samples used in AIL are generated similarly). Then our decipher module will translate these special tokens into natural language. For AEM, we fine-tune the LM with the above-mentioned Source-CoE-Target data (as shown in Figure A1, "Input for AEM") with the common language modeling objective, which is to maximize the probability of generating ground truth tokens at each decoding step. We train with three epochs for each task by default but set an early-stopping condition when the evaluation loss does not decrease (i.e., plateaus) for five intermediate evaluation steps. LM can know the boundary between Context + Source and Chain-of-Edits (CoEs) + Target.
Neural Information Processing Systems
Oct-1-2025, 20:57:08 GMT
- Country:
- Asia
- China
- South Korea (0.04)
- Europe > Ireland
- Leinster > County Dublin > Dublin (0.04)
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Michigan > Washtenaw County
- Canada > British Columbia
- Asia
- Genre:
- Research Report (0.47)
- Industry:
- Media (0.30)
- Technology: