TF-IDF Refresher
Term Frequency-Inverse Document Frequency is a numerical statistic that is intended to reflect how important a word is to a document, in a collection or corpus. Simply put, TF-IDF shows the relative importance of a word or words to a document, given a collection of documents. Note that before we can do text-classification, the text must be translated into some form of numerical representation, a process known as text-embedding. The resulting numerical representation which is usually in the form of vectors can then be used as input to a wide range of classification models. TF-IDF is the most popular approach to embed texts into numerical vectors for modeling, information retrieval and text-mining.
Oct-13-2020, 01:31:38 GMT
- Technology: