Tackling the US Government's PDF Mountain With Computer Vision

Dec-28-2021, 06:35:22 GMT–#artificialintelligence

Adobe's PDF format has entrenched itself so deeply in US government document pipelines that the number of state-issued documents currently in existence is conservatively estimated to be in the hundreds of millions. Often opaque and lacking metadata, these PDFs – many created by automated systems – collectively tell no stories or sagas; if you don't know exactly what you're looking for, you'll probably never find a pertinent document. And if you did know, you probably didn't need the search. However a new project is using computer vision and other machine learning approaches to change this almost unapproachable mountain of data into a valuable and explorable resource for researchers, historians, journalists and scholars. When the US government discovered Adobe's Portable Document Format (PDF) in the 1990s, it decided that it liked it.

dataset, government department, us government department, (12 more...)

#artificialintelligence

Dec-28-2021, 06:35:22 GMT

News Web Page

Add feedback

AI-Alerts:
- 2021 > 2021-12 > AAAI AI-Alert for Dec 28, 2021 (1.00)

Country:
- North America > United States > District of Columbia > Washington (0.04)

Genre:
- Research Report (0.69)

Industry:
- Government > Regional Government > North America Government > United States Government (1.00)

Technology:
- Information Technology
  - Information Management > Search (1.00)
  - Artificial Intelligence > Machine Learning (1.00)