Controllable Emphasis with zero data for text-to-speech