Text Data Preprocessing: A Walkthrough in Python

Mar-26-2018, 20:57:53 GMT–@machinelearnbot

In a pair of previous posts, we first discussed a framework for approaching textual data science tasks, and followed that up with a discussion on a general approach to preprocessing text data. This post will serve as a practical walkthrough of a text data preprocessing task using some common Python tools. Our goal is to go from what we will describe as a chunk of text (not to be confused with text chunking), a lengthy, unprocessed single string, and end up with a list (or several lists) of cleaned tokens that would be useful for further text mining and/or natural language processing tasks. First we start with our imports. If you have NLTK installed, yet require the download of its any additional data, see here.

artificial intelligence, natural language, tokenization, (13 more...)

@machinelearnbot

Mar-26-2018, 20:57:53 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)