Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis