PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
Zent, Matthew, Smith, Digory, Woodhead, Simon
–arXiv.org Artificial Intelligence
Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.
arXiv.org Artificial Intelligence
May-23-2025
- Country:
- Asia > Singapore (0.04)
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Spain (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Switzerland (0.04)
- United Kingdom (0.04)
- Croatia > Dubrovnik-Neretva County
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Georgia
- Fulton County > Atlanta (0.14)
- Mexico > Mexico City
- Genre:
- Instructional Material (0.68)
- Research Report (0.64)
- Industry:
- Technology: