Collaborating Authors

AI Researchers Estimate 97% Of EU Websites Fail GDPR Privacy Requirements- Especially User Profiling


Researchers in the US have used machine learning techniques to study the GDPR privacy policies of over a thousand representative websites based in the EU. They found that 97% of the sites studied failed to comply with at least one requirement of the European Union's 2018 regulatory framework, and that they complied least of all with regulatory requirements around the practice of'user profiling'. '[Since] the privacy policy is the essential communication channel for users to understand and control their privacy, many companies updated their privacy policies after GDPR was enforced. However, most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights. Therefore, it is unclear if they comply with GDPR.' 'Our results show that even after GDPR went into effect, 97% of websites still fail to comply with at least one requirement of GDPR.'

Automating the GDPR Compliance Assessment for Cross-border Personal Data Transfers in Android Applications Artificial Intelligence

Abstract-- The General Data Protection Regulation (GDPR) aims to ensure that all personal data processing activities are fair and transparent for the European Union (EU) citizens, regardless of whether these are carried out within the EU or anywhere else. To this end, it sets strict requirements to transfer personal data outside the EU. However, checking these requirements is a daunting task for supervisory authorities, particularly in the mobile app domain due to the huge number of apps available and their dynamic nature. In this paper, we propose a fully automated method to assess compliance of mobile apps with the GDPR requirements for cross-border personal data transfers. We have applied the method to the top-free 10,080 apps from the Google Play Store. The results reveal that there is still a very significant gap between what app providers and third-party recipients do in practice and what is intended by the GDPR. A substantial 56% of analysed apps are potentially non-compliant with the GDPR cross-border transfer requirements. THE distributed nature of today's digital systems and services across the world [1], or shared between chains of thirdparty not only facilitates the collection of personal data service providers [6], even without the app developer's from individuals anywhere, but also their transfer to different knowledge [7]. Second, apps are distributed through countries around the world [1]. This raises potential global stores, enabling app providers to easily reach markets risks to the privacy of individuals, as the organizations and users beyond its country of residence. In this sending and receiving personal data can be subject to different context, there is a need for constant vigilance by the various data protection laws and, therefore, may not offer an stakeholders, including app developers, supervisory equivalent level of protection.

AI-enabled Automation for Completeness Checking of Privacy Policies Artificial Intelligence

Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisions of GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related software specifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automation for the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterize the privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop an automated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machine learning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against the completeness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseen privacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23 false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword search only, our approach results in an improvement of 24.5% in precision and 38% in recall.

Compliance Generation for Privacy Documents under GDPR: A Roadmap for Implementing Automation and Machine Learning Artificial Intelligence

We shift this perspective with the Privatech project to focus on corporations and law firms as agents of compliance. To comply with data protection laws, data processors must implement accountability measures to assess and document compliance in relation to both privacy documents and privacy practices. In this paper, we survey, on the one hand, current research on GDPR automation, and on the other hand, the operational challenges corporations face to comply with GDPR, and that may benefit from new forms of automation. We attempt to bridge the gap. We provide a roadmap for compliance assessment and generation by identifying compliance issues, breaking them down into tasks that can be addressed through machine learning and automation, and providing notes about related developments in the Privatech project.

A Comparative Study of Sequence Classification Models for Privacy Policy Coverage Analysis Machine Learning

Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data. Unfortunately, such documents are often overly complicated and filled with legal jargon; making it difficult for users to fully grasp what exactly is being collected and why. Our solution to this problem is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques. Given a website's privacy policy, the classifier identifies the associated data practice for each logical segment. These data practices/labels are taken directly from the OPP-115 corpus. For example, the data practice "Data Retention" refers to how long a website stores a user's information. The coverage analysis allows users to determine how many of the ten possible data practices are covered, along with identifying the sections that correspond to the data practices of particular interest.