Evaluating Language Models for Generating and Judging Programming Feedback