Analyzing the first Presidential Debate
A significant chunk of the data that we encounter on a daily basis is available in an unstructured, free text format. Hence, the ability to glean useful bits of information from this unstructured pile can be quite valuable. In this post, we will attempt a basic analysis of the text from the first Presidential debate between Clinton and Trump. A good part of this post involves data manipulation steps to convert the raw transcript text (of the debate) into a more structured/ ordered form, which you can then start analyzing – This initial data manipulation process to transform the raw text into a more structured form suitable for further analysis/modelling, is a key step in any text analytics effort, and hence a key focus point of this post. Post data transformation and structuring, we attempt to answer a few simple questions from the data (such as Who spoke more, Who interrupted more, Key discussion points etc).
Oct-3-2016, 17:36:14 GMT