How I achieved 90% accuracy on a text classification problem with ZERO preprocessing
I chose to use the AG news benchmark dataset. I recuperated the training and test test from John Snow Labs (a must see reference for all things NLP). This dataset is divided into four balanced categories for a total of 120,000 rows as seen below. The dataset is formatted into 2 columns, category and description. Because I want this to be a succinct post, I will refer you to my previous article to find out how to use Spark NLP in Colab.
Mar-27-2021, 17:45:22 GMT
- Technology: