Using Crowdsourcing to Improve Profanity Detection
Sood, Sara Owsley (Pomona College) | Antin, Judd (Yahoo! Research) | Churchill, Elizabeth (Yahoo! Research)
Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution – making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of list-based profanity detection techniques. The use of crowdsourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.
Mar-25-2012
- Country:
- North America > United States
- California (0.14)
- Nevada (0.14)
- North America > United States
- Industry:
- Law > Civil Rights & Constitutional Law (0.34)
- Technology: