CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data

Open in new window