Text Similarity w/ Levenshtein Distance in Python
In this article I will go over the intuition behind how Levenshtein distance works and how to use Levenshtein distance in building a plagiarism detection pipeline. Identifying similarity between text is a common problem in NLP and is used by many companies world wide. The most common application of text similarity comes from the form of identifying plagiarized text. Educational facilities ranging from elementary school, high school, college and universities all around the world use services like Turnitin to ensure that the work submitted by students is original and their own. Other applications of text similarity is commonly used by companies which have a similar structure to Stack Overflow or Stack Exchange.
Mar-15-2022, 16:40:22 GMT