Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation