Privacy Classification Systems: Recall and Precision Optimization as Enabler of Trusted Information Sharing
Hogan, Christopher (H5) | Bauer, Robert S. (H5)
Information is shared more extensively when a user can confidently classify all his information according to its desired degree of disclosure prior to transmission. While high quality classification is relatively straightforward for structured data (e.g., credit card numbers, cookies, "confidential" reports), most consumer and business information is unstructured (e.g., Facebook posts, corporate email). All current technological approaches to classifying unstructured information seek to identify only that information having the desired characteristics (i.e., to maximize the percentage of filtered content that requires privacy protection). Such focus on boosting classifier Precision (P) causes technology solutions to miss sensitive information [i.e., Recall (R) is compromised for the sake of P improvement]. Such privacy protection will fall short of user expectations no matter how "intelligent" the technology may be in extending beyond keywords to user meaning. Systems must simultaneously optimize both P and R in order to protect privacy sufficiently to encourage the free flow of personal and corporate information. This requires a socio-technical methodology wherein the user is intimately involved in iterative privacy improvement. The approach is a general one in which the classifier can be modified as necessary at any time when sampling measures of P and R deem it appropriate. Matching the ever-evolving user privacy model to the technology solution (e.g., active machine learning) affords a technique for building and maintaining user trust.
Mar-22-2010
- Country:
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Texas (0.15)
- California > San Francisco County
- North America > United States
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: