Text clustering applied to data augmentation in legal contexts
Freitas, Lucas José Gonçalves, Rodrigues, Thaís, Rodrigues, Guilherme, Edokawa, Pamella, Farias, Ariane
–arXiv.org Artificial Intelligence
Data analysis and machine learning are of preeminent importance in the legal domain, especially in tasks like clustering and text classification. In this study, we harnessed the power of natural language processing tools to enhance datasets meticulously curated by experts. This process significantly improved the classification workflow for legal texts using machine learning techniques. We considered the Sustainable Development Goals (SDGs) data from the United Nations 2030 Agenda as a practical case study. Data augmentation clustering-based strategy led to remarkable enhancements in the accuracy and sensitivity metrics of classification models. For certain SDGs within the 2030 Agenda, we observed performance gains of over 15%. In some cases, the example base expanded by a noteworthy factor of 5. When dealing with unclassified legal texts, data augmentation strategies centered around clustering prove to be highly effective. They provide a valuable means to expand the existing knowledge base without the need for labor-intensive manual classification efforts.
arXiv.org Artificial Intelligence
Apr-8-2024
- Country:
- Africa > South Africa (0.04)
- Asia
- India (0.04)
- Middle East > Israel (0.28)
- Europe
- Belgium (0.04)
- France (0.04)
- Germany (0.04)
- Italy (0.04)
- Netherlands > South Holland
- Switzerland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- Oxfordshire > Oxford (0.04)
- North America
- Canada > British Columbia
- United States
- California
- Alameda County > Oakland (0.04)
- San Francisco County > San Francisco (0.14)
- Colorado (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.14)
- New York > New York County
- New York City (0.14)
- California
- Oceania > Australia (0.04)
- South America
- Genre:
- Research Report
- Experimental Study (0.34)
- New Finding (0.35)
- Research Report
- Industry:
- Technology: