Text-Conditional Contextualized Avatars For Zero-Shot Personalization