Data scientists must always remember that data sets are not objective - they are selected, collected, filtered, structured and analyzed by human design. Naked and hidden biases in selecting, collecting, structuring and analyzing data present serious risks. For example, a recent Wall Street Journal article entitled "Tweets Provide New Way to Gauge TV Audiences" provides evidence of a disconnect between mainstream viewers and folks who use Twitter. The chart above shows the disconnect between the most popular and most tweeted shows - the most tweeted show is not a top ten show. While Twitter data can be useful for detecting trends and sentiments for certain areas (e.g., disease surveillance, natural disaster surveillance, product sentiments, financial trading, politics) in limited circumstances using scientific methods, it can also mislead and present a false view of reality.
Mar-23-2016, 06:00:10 GMT