rich Documents with Layout Annotations from Web Crawl Data Maurice Weber ETH Zurich Carlo Siebenschuh University of Chicago Rory M. Butler