Thunder-DeID: Accurate and Efficient De-identification Framework for Korean Court Judgments
Hahm, Sungeun, Kim, Heejin, Lee, Gyuseong, Park, Hyunji, Lee, Jaejin
–arXiv.org Artificial Intelligence
To ensure a balance between open access to justice and personal data protection, the South Korean judiciary mandates the de-identification of court judgments before they can be publicly disclosed. However, the current de-identification process is inadequate for handling court judgments at scale while adhering to strict legal requirements. Additionally, the legal definitions and categorizations of personal identifiers are vague and not well-suited for technical solutions. To tackle these challenges, we propose a de-identification framework called Thunder-DeID, which aligns with relevant laws and practices. Specifically, we (i) construct and release the first Korean legal dataset containing annotated judgments along with corresponding lists of entity mentions, (ii) introduce a systematic categorization of Personally Identifiable Information (PII), and (iii) develop an end-to-end deep neural network (DNN)-based de-identification pipeline. Our experimental results demonstrate that our model achieves state-of-the-art performance in the de-identification of court judgments.
arXiv.org Artificial Intelligence
Oct-17-2025
- Country:
- Asia > South Korea (1.00)
- Europe (1.00)
- North America > United States (0.93)
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Transportation
- Ground > Road (1.00)
- Infrastructure & Services (1.00)
- Passenger (1.00)
- Banking & Finance > Real Estate (1.00)
- Government
- Health & Medicine
- Health Care Providers & Services (1.00)
- Therapeutic Area (1.00)
- Law (1.00)
- Education > Educational Setting
- K-12 Education (0.67)
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Technology: