Using LLMs to create analytical datasets: A case study of reconstructing the historical memory of Colombia
Anderson, David, Benitez, Galia, Bjarnadottir, Margret, Reyya, Shriyan
–arXiv.org Artificial Intelligence
Colombia has been submerged in decades of armed conflict, yet until recently, the systematic documentation of violence was not a priority for the Colombian government. This has resulted in a lack of publicly available conflict information and, consequently, a lack of historical accounts. This study contributes to Colombia's historical memory by utilizing GPT, a large language model (LLM), to read and answer questions about over 200,000 violence-related newspaper articles in Spanish. We use the resulting dataset to conduct both descriptive analysis and a study of the relationship between violence and the eradication of coca crops, offering an example of policy analyses that such data can support. Our study demonstrates how LLMs have opened new research opportunities by enabling examinations of large text corpora at a previously infeasible depth.
arXiv.org Artificial Intelligence
Sep-8-2025
- Country:
- North America > United States
- Maryland > Prince George's County
- College Park (0.14)
- Michigan > Ingham County
- East Lansing (0.04)
- Lansing (0.04)
- Maryland > Prince George's County
- Pacific Ocean (0.04)
- South America > Colombia
- Arauca Department > Arauca (0.04)
- Bolivar Department (0.04)
- Cauca Department (0.04)
- Huila Department (0.04)
- Putumayo Department (0.04)
- Southwest Colombia (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.46)
- Industry:
- Technology: