Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation