Automatic Webpage Classification • /r/MachineLearning

@machinelearnbot 

I'm trying to create a document classifier but I'm not able to think of features to use. Anybody has experience with this? I used beautiful soup to remove the tags. I know tf-idf can be used, but not exactly sure how. Suggestions on how to'clean' the data better (eg removing stop words, stemming, etc) are also welcome.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found