CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents