Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).
Text data preparation is different for each problem. Preparation starts with simple steps, like loading data, but quickly gets difficult with cleaning tasks that are very specific to the data you are working with. You need help as to where to begin and what order to work through the steps from raw data to data ready for modeling.
Natural language processing technologies have become quite sophisticated over the past few years. From tech giants to hobbyists, many are rushing to build rich interfaces that can analyze, understand, and respond to natural language. Amazon's Alexa, Microsoft's Cortana, Google's Google Home, and Apple's Siri all aim to change the way we interact with computers. Sentiment analysis, a subfield of natural language processing, consists of techniques that determine the tone of a text or speech. Today, with machine learning and large amounts of data harvested from social media and review sites, we can train models to identify the sentiment of a natural language passage with fair accuracy.
A machine says this social post mentions ABC and flags several negative words, classifying it as a negative sentiment post for ABC. Using statistical methods to calculate the appropriate sample size at a 95% confidence level, with a /- 5% margin of error, you'd need to examine 385 randomly sampled Tweets to accurately represent the whole population. In the example below, I've assigned a score of 1 to any positive Tweet, 0 to something neutral or irrelevant, and -1 to something negative: When you're done, you can tally up the positives, negatives, and neutrals, report on overall sentiment, and state with confidence that your sentiment analysis is at a 95% confidence level with /- 5% margin of error. It takes significant investments of time, people, and effort, but if you want truly accurate sentiment analysis in your gathered public opinion data, it's the only way to go for now.