Goto

Collaborating Authors

Apple's 'Differential Privacy' Is About Collecting Your Data--But Not Your Data

@machinelearnbot

Apple, like practically every mega-corporation, wants to know as much as possible about its customers. But it's also marketed itself as Silicon Valley's privacy champion, one that--unlike so many of its advertising-driven competitors--wants to know as little as possible about you. So perhaps it's no surprise that the company has now publicly boasted about its work in an obscure branch of mathematics that deals with exactly that paradox. At the keynote address of Apple's Worldwide Developers' Conference in San Francisco on Monday, the company's senior vice president of software engineering Craig Federighi gave his familiar nod to privacy, emphasizing that Apple doesn't assemble user profiles, does end-to-end encrypt iMessage and Facetime and tries to keep as much computation as possible that involves your private information on your personal device rather than on an Apple server. But Federighi also acknowledged the growing reality that collecting user information is crucial to making good software, especially in an age of big data analysis and machine learning.


Apple's 'Differential Privacy' Is About Collecting Your Data--But Not Your Data

WIRED

Apple, like practically every mega-corporation, wants to know as much as possible about its customers. But it's also marketed itself as Silicon Valley's privacy champion, one that--unlike so many of its advertising-driven competitors--wants to know as little as possible about you. So perhaps it's no surprise that the company has now publicly boasted about its work in an obscure branch of mathematics that deals with exactly that paradox. At the keynote address of Apple's Worldwide Developers' Conference in San Francisco on Monday, the company's senior vice president of software engineering Craig Federighi gave a his familiar nod to privacy, emphasizing that Apple doesn't assemble user profiles, does end-to-end encrypt iMessage and Facetime and tries to keep as much computation as possible that involves your private information on your personal device rather than on an Apple server. But Federighi also acknowledged the growing reality that collecting user information is crucial to making good software, especially in an age of big data analysis and machine learning.


Forget privacy: you're terrible at targeting anyway

#artificialintelligence

I don't mind letting your programs see my private data as long as I get something useful in exchange. But that's not what happens. A former co-worker told me once: "Everyone loves collecting data, but nobody loves analyzing it later." This claim is almost shocking, but people who have been involved in data collection and analysis have all seen it. It starts with a brilliant idea: we'll collect information about every click someone makes on every page in our app! And we'll track how long they hesitate over a particular choice! And how often they use the back button!


You're very easy to track down, even when your data has been anonymized

#artificialintelligence

The data trail we leave behind us grows all the time. Most of it isn't that interesting--the takeout meal you ordered, that shower head you bought online--but some of it is deeply personal: your medical diagnoses, your sexual orientation, or your tax records. The most common way public agencies protect our identities is anonymization. This involves stripping out obviously identifiable things such as names, phone numbers, email addresses, and so on. Data sets are also altered to be less precise, columns in spreadsheets are removed, and "noise" is introduced to the data.


Minimax Optimal Estimation of Approximate Differential Privacy on Neighboring Databases

Neural Information Processing Systems

Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance.