Behavioral Testing of NLP models with CheckList

#artificialintelligence 

When developing an NLP model, it's a standard practice to test how well a model generalizes to unseen examples by evaluating it on a held-out dataset. Suppose we reach our target performance metric of 95% on a held-out dataset and thus deploy the model to production based on this single metric. But, when real users start using it, the story could be completely different than what our 95% performance metric was saying. Our model might perform poorly even on simple variations of the training text. In contrast, the field of software engineering uses a suite of unit tests, integration tests, and end-to-end tests to evaluate all aspects of the product for failures.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found