Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space