Text Processing in R
This tutorial goes over some basic concepts and commands for text processing in R. R is not the only way to process text, nor is it always the best way. Python is the de-facto programming language for processing text, with a lot of built-in functionality that makes it easy to use, and pretty fast, as well as a number of very mature and full featured packages such as NLTK and textblob. Basic shell scripting can also be many orders of magnitude faster for processing extremely large text corpora -- for a classic reference see Unix for Poets. Yet there are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. Furthermore, there is a lot of very active development going on in the R text analysis community right now (see especially the quanteda package).
Mar-11-2018, 22:45:44 GMT