Interpreting Learned Feedback Patterns in Large Language Models