Benchmarking and Improving LLM Robustness for Personalized Generation