(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs