Tackling the US Government's PDF Mountain With Computer Vision
Adobe's PDF format has entrenched itself so deeply in US government document pipelines that the number of state-issued documents currently in existence is conservatively estimated to be in the hundreds of millions. Often opaque and lacking metadata, these PDFs – many created by automated systems – collectively tell no stories or sagas; if you don't know exactly what you're looking for, you'll probably never find a pertinent document. And if you did know, you probably didn't need the search. However a new project is using computer vision and other machine learning approaches to change this almost unapproachable mountain of data into a valuable and explorable resource for researchers, historians, journalists and scholars. When the US government discovered Adobe's Portable Document Format (PDF) in the 1990s, it decided that it liked it.
Dec-28-2021, 06:35:22 GMT
- AI-Alerts:
- 2021 > 2021-12 > AAAI AI-Alert for Dec 28, 2021 (1.00)
- Country:
- North America > United States (1.00)
- Genre:
- Research Report (0.69)
- Industry:
- Technology: