Grading on a curve? Why AI systems test brilliantly but stumble in real life - ScienceBlog.com