Reducing Privacy Risks in Online Self-Disclosures with Language Models
Dou, Yao, Krsek, Isadora, Naous, Tarek, Kabra, Anubha, Das, Sauvik, Ritter, Alan, Xu, Wei
–arXiv.org Artificial Intelligence
Self-disclosure, while being common and rewarding in social media interaction, also poses privacy risks. In this paper, we take the initiative to protect the user-side privacy associated with online self-disclosure through identification and abstraction. We develop a taxonomy of 19 self-disclosure categories, and curate a large corpus consisting of 4.8K annotated disclosure spans. We then fine-tune a language model for identification, achieving over 75% in Token F$_1$. We further conduct a HCI user study, with 82\% of participants viewing the model positively, highlighting its real world applicability. Motivated by the user feedback, we introduce the task of self-disclosure abstraction. We experiment with both one-span abstraction and three-span abstraction settings, and explore multiple fine-tuning strategies. Our best model can generate diverse abstractions that moderately reduce privacy risks while maintaining high utility according to human evaluation.
arXiv.org Artificial Intelligence
Nov-15-2023
- Country:
- Asia > Middle East
- UAE (0.14)
- Europe (1.00)
- North America > United States (0.67)
- Asia > Middle East
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report (1.00)
- Industry:
- Government (0.67)
- Health & Medicine > Therapeutic Area
- Psychiatry/Psychology (0.46)
- Information Technology > Security & Privacy (0.94)
- Media (0.70)
- Technology: