Newspaper Navigator
–University of Washington Computer Science
Welcome to the Newspaper Navigator dataset! This dataset consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project. The dataset also includes text corresponding to the visual content, identified by extracting the Optical Character Recognition, or OCR, within each predicted bounding box. For example, if the visual content recognition model predicted a bounding box around a headline, the corresponding textual content provides a machine-readable version of the headline; likewise, for a photograph, illustration, or map, this textual representation often contains the title and caption.
University of Washington Computer Science
May-6-2020, 21:24:28 GMT
- Technology: