PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
Zent, Matthew, Smith, Digory, Woodhead, Simon
–arXiv.org Artificial Intelligence
Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.
arXiv.org Artificial Intelligence
May-23-2025
- Country:
- Europe (1.00)
- North America > United States (0.47)
- Genre:
- Instructional Material (0.68)
- Research Report (0.64)
- Industry:
- Technology: