How Private are Language Models in Abstractive Summarization?
Hughes, Anthony, Aletras, Nikolaos, Ma, Ning
–arXiv.org Artificial Intelligence
Language models (LMs) have shown outstanding performance in text summarization including sensitive domains such as medicine and law. In these settings, it is important that personally identifying information (PII) included in the source document should not leak in the summary. Prior efforts have mostly focused on studying how LMs may inadvertently elicit PII from training data. However, to what extent LMs can provide privacy-preserving summaries given a non-private source document remains under-explored. In this paper, we perform a comprehensive study across two closed- and three open-weight LMs of different sizes and families. We experiment with prompting and fine-tuning strategies for privacy-preservation across a range of summarization datasets across three domains. Our extensive quantitative and qualitative analysis including human evaluation shows that LMs often cannot prevent PII leakage on their summaries and that current widely-used metrics cannot capture context dependent privacy risks.
arXiv.org Artificial Intelligence
Dec-16-2024
- Country:
- Africa > Rwanda
- Asia
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Singapore (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Czechia > Prague (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England
- South Yorkshire > Sheffield (0.04)
- Monaco (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > Berlin (0.04)
- Austria > Vienna (0.14)
- North America
- Canada
- Mexico > Mexico City
- Mexico City (0.04)
- Montserrat (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Florida > Miami-Dade County
- Miami (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas > Harris County
- Houston (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Francisco County
- South America > Chile
- Genre:
- Research Report
- Experimental Study (0.67)
- New Finding (0.67)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: