Towards Personalized Deep Research: Benchmarks and Evaluations