Validity

#artificialintelligence 

A lot of discussion around Matt Jockers' Syuzhet package (involving Annie Swafford, Ted Underwood, Andrew Piper, Scott Weingart and many others) has focused on issues of validity -- whether sentiment analysis is accurate enough for the task, whether the Fourier transform is an appropriate method for dimensionality reduction, whether the emotional trajectories themselves are valid measurements of anything at all (Scott has a good enumeration of the various issues here.) Andrew's discussion of the validity of inherently subjective measurements inspired me to solicit at least one data point from readers that we can use for one question under discussion with Syuzhet: what does a human judgment of the "emotional trajectory" of a work look like, and how often do readers agree with each other on this task? This method of soliciting human judgments for inherently subjective tasks is at the core of NLP and a lot of machine learning -- syntactic parsing, part of speech tagging, named entity recognition, topic classification, sentiment analysis, and lots of other tasks all rely on humans making judgments that are often surprisingly difficult in practice; learning algorithms in these cases are not so much learning any notion of "truth" but simply to reproduce the human judgments they're given. Agreement rates between humans is often seen as a proxy for the complexity of the task; if humans can't agree, it can be a sign that the task is ill-defined or underspecified. Word sense disambiguation is one good example of this, with low inter-annotator agreement rates [Snyder and Palmer 2004]; while sentiment analysis was originally designed with product/movie reviews in mind (does person X like product Y?) -- i.e., attitude with respect to a particular target -- I think the more general sentiment-as-tone problem (is this tweet happy or sad?) is much less well specified as a problem with an answer that can be judged by anyone but the original author. One aspect of those kind of annotations that I think is much less explored (which Piper points to and I think would be an extremely interesting area to work on) is the case where multiple judgments are simultaneously valid -- different interpretations of the same phenomenon, each backed by their own argument.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found