Talk like a President:
How do different presidents present themselves differently in different contexts? How do different ways of presenting oneself influence presidential popularity? This paper aims to answer these questions by implementing a multi-modal deep learning pipeline, extracting information from the text, audio, and image data from presidential speeches. This paper will first walk through the motivations and (brief) literature review. Then, we will introduce in order the seven models we ran for encoding the text (FastText and BERT), audio (CNN audio classifier, CNN emotion recognition), image models (EfficientNet, CNN emotion recognition), and multimodal prediction (self-defined RankNet).
Jun-4-2022, 20:50:15 GMT