See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement