Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis