Interpreting Learned Feedback Patterns in Large Language Models Luke Marks Amir Abdullah Clement Neo Rauno Arike David Krueger Philip T orr
–Neural Information Processing Systems
Learned Feedback Pattern (LFP) for patterns in an LLM's activations learned during RLHF that improve its performance on the fine-tuning task.
Neural Information Processing Systems
Oct-10-2025, 00:22:32 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- North America > United States
- Oregon > Multnomah County
- Portland (0.04)
- Texas (0.04)
- Oregon > Multnomah County
- Europe > United Kingdom
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Leisure & Entertainment (0.93)
- Media > Television (0.46)
- Technology: