Generating Privacy Stories From Software Documentation
Baldwin, Wilder, Chintakuntla, Shashank, Parajuli, Shreyah, Pourghasemi, Ali, Shanz, Ryan, Ghanavati, Sepideh
–arXiv.org Artificial Intelligence
--Research shows that analysts and developers consider privacy as a security concept or as an afterthought, which may lead to non-compliance and violation of users' privacy. Most current approaches, however, focus on extracting legal requirements from the regulations and evaluating the compliance of software and processes with them. In this paper, we develop a novel approach based on chain-of-thought prompting (CoT), in-context-learning (ICL), and Large Language Models (LLMs) to extract privacy behaviors from various software documents prior to and during software development, and then generate privacy requirements in the format of user stories. Our results show that most commonly used LLMs, such as GPT -4o and Llama 3, can identify privacy behaviors and generate privacy user stories with F1 scores exceeding 0.8. We also show that the performance of these models could be improved through parameter-tuning. Our findings provide insight into using and optimizing LLMs for generating privacy requirements given software documents created prior to or throughout the software development lifecycle. Understanding the privacy behaviors of software applications and eliciting privacy requirements during the early phases of the software development lifecycle (SDLC) are essential for developing privacy-preserving and regulatory-compliant software [1], [2]. Past research, however, shows that software analysts and developers often consider privacy as a subset of security requirements or as an afterthought [3], [4], and they often lack the tools needed to understand and identify privacy behaviors of the applications they develop [5], [6]. Most common approaches for identifying and eliciting privacy requirements include conducting privacy impact assessments [7], [8], or employing goal-oriented methodologies to map privacy requirements to system processes [8]-[10]. Other works aim to extract privacy-related information from user stories or use case models [11]-[17] by leveraging Natural Language Processing (NLP) techniques and then using predefined templates to generate privacy requirements. However, these approaches mostly focus on the specific forms of software documentation (i.e., user stories or use cases), or they rely on developers to understand how personal information is handled by their applications.
arXiv.org Artificial Intelligence
Jul-1-2025
- Country:
- North America
- Canada (0.04)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Maine > Penobscot County
- Orono (0.14)
- California > San Francisco County
- North America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology: