Evaluating Text Output in NLP: BLEU at your own risk

#artificialintelligence 

One question I get fairly often from folks who are just getting into NLP is how to evaluate systems when the output of that system is text, rather than some sort of classification of the input text. These types of problems, where you put some text into your model and get some other text out of it, are known as sequence to sequence or string transduction problems. This sort of technology is right out of science fiction. With such a wide range of exciting applications, it's easy to see why sequence to sequence modeling is more popular than ever. What's not easy is actually evaluating these systems. Unfortunately for folks who are just getting started, there's no simple answer about what metric you should use to evaluate your model. Even worse, one of the most popular metrics for evaluating sequence to sequence tasks, BLEU, has major drawbacks, especially when applied to tasks that it was never intended to evaluate.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found