Interpreting Learned Feedback Patterns in Large Language Models Luke Marks Amir Abdullah Clement Neo