Wilson, Shomir



Automatic Extraction of Opt-Out Choices from Privacy Policies

AAAI Conferences

Online “notice and choice” is an essential concept in the US FTC’s Fair Information Practice Principles. Privacy laws based on these principles include requirements for providing notice about data practices and allowing individuals to exercise control over those practices. Internet users need control over privacy, but their options are hidden in long privacy policies which are cumbersome to read and understand. In this paper, we describe several approaches to automatically extract choice instances from privacy policy documents using natural language processing and machine learning techniques. We define a choice instance as a statement in a privacy policy that indicates the user has discretion over the collection, use, sharing, or retention of their data. We describe supervised machine learning approaches for automatically extracting instances containing opt-out hyperlinks and evaluate the proposed methods using the OPP-115 Corpus, a dataset of annotated privacy policies. Extracting information about privacy choices and controls enables the development of concise and usable interfaces to help Internet users better understand the choices offered by online services. The focus of this paper, however, is to describe such methods to automatically extract useful opt-out hyperlinks from privacy policies.


Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies

AAAI Conferences

Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.


The Metacognitive Loop: An Architecture for Building Robust Intelligent Systems

AAAI Conferences

What commonsense knowledge do intelligent systems need, in order to recover from failures or deal with unexpected situations? It is impractical to represent predetermined solutions to deal with every unanticipated situation or provide predetermined fixes for all the different ways in which systems may fail. We contend that intelligent systems require only a finite set of anomaly-handling strategies to muddle through anomalous situations. We describe a generalized metacognition module that implements such a set of anomaly-handling strategies and that in principle can be attached to any host system to improve the robustness of that system. Several implemented studies are reported, that support our contention.


A Self-Help Guide For Autonomous Systems

AI Magazine

When things go badly, we notice that something is amiss, figure out what went wrong and why, and attempt to repair the problem. Artificial systems depend on their human designers to program in responses to every eventuality and therefore typically don't even notice when things go wrong, following their programming over the proverbial, and in some cases literal, cliff. This article describes our past and current work on the Meta-Cognitive Loop, a domain-general approach to giving artificial systems the ability to notice, assess, and repair problems. The goal is to make artificial systems more robust and less dependent on their human designers.


A Self-Help Guide For Autonomous Systems

AI Magazine

Humans learn from their mistakes. When things go badly, we notice that something is amiss, figure out what went wrong and why, and attempt to repair the problem. Artificial systems depend on their human designers to program in responses to every eventuality and therefore typically don’t even notice when things go wrong, following their programming over the proverbial, and in some cases literal, cliff. This article describes our past and current work on the Meta-Cognitive Loop, a domain-general approach to giving artificial systems the ability to notice, assess, and repair problems. The goal is to make artificial systems more robust and less dependent on their human designers.