
In each competition, numerous practitioners repeatedly evaluated their progress against a holdout set that forms the basis of a public ranking availablethroughout the competition. Performance on a separate test set used only oncedetermined the final ranking.