Natural Language Identification Machine Learning Pipeline with Python and Scikit-Learn
Graduate Project for Harvard's Python for Data Science (CSCI E - 29) In this project, I pulled text data from European Parliament Proceedings in 21 languages. Using Scikit-Learn, I transformed the raw text into a numerical feature matrix, and trained a Multinomial naive bayes probability model to classify input language with greater than 99% accuracy.
May-7-2018, 21:06:32 GMT
- Technology: