Using Crowdsourcing to Improve Profanity Detection
Sood, Sara Owsley (Pomona College) | Antin, Judd (Yahoo! Research) | Churchill, Elizabeth (Yahoo! Research)
Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution – making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of list-based profanity detection techniques. The use of crowdsourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.
Mar-25-2012
- Country:
- Asia > Myanmar
- Tanintharyi Region > Dawei (0.04)
- Europe
- North America > United States
- California
- Los Angeles County > Claremont (0.04)
- Santa Clara County > Santa Clara (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- New York > New York County
- New York City (0.04)
- California
- Asia > Myanmar
- Industry:
- Law > Civil Rights & Constitutional Law (0.34)
- Technology: