Reducing Privacy Risks in Online Self-Disclosures with Language Models

Dou, Yao, Krsek, Isadora, Naous, Tarek, Kabra, Anubha, Das, Sauvik, Ritter, Alan, Xu, Wei

Nov-15-2023–arXiv.org Artificial Intelligence

Self-disclosure, while being common and rewarding in social media interaction, also poses privacy risks. In this paper, we take the initiative to protect the user-side privacy associated with online self-disclosure through identification and abstraction. We develop a taxonomy of 19 self-disclosure categories, and curate a large corpus consisting of 4.8K annotated disclosure spans. We then fine-tune a language model for identification, achieving over 75% in Token F$_1$. We further conduct a HCI user study, with 82\% of participants viewing the model positively, highlighting its real world applicability. Motivated by the user feedback, we introduce the task of self-disclosure abstraction. We experiment with both one-span abstraction and three-span abstraction settings, and explore multiple fine-tuning strategies. Our best model can generate diverse abstractions that moderately reduce privacy risks while maintaining high utility according to human evaluation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-15-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- Europe (1.00)
- North America > United States (0.67)

Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report (1.00)

Industry:
- Government (0.67)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.46)
- Information Technology > Security & Privacy (0.94)
- Media (0.70)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)
    - Natural Language
      - Chatbot (0.94)
      - Large Language Model (1.00)
  - Communications > Social Media (1.00)