Classifying Toxic Comments with Natural Language Processing


Regardless of whether you have a Medium account, Youtube channel, or play League of Legends, you have probably seen toxic comments somewhere on the internet before. Toxic behavior, which includes rude, hateful, and threatening actions, is an issue that stops a productive comment thread, and turns it into a battle. Needless to say, developing and artificial intelligence to identify and classify toxic comments would greatly help many online groups and communities. The data for this project can be found on Kaggle. This data set contains hundreds of thousands of comments, each labelled with some of the following traits: toxic, severe toxic, obscene, threat, insult, and identity hate. Here are two examples of a toxic comment, and a non-toxic comment with their labels.

