Collaborating Authors

AI Can Identify People Even in Anonymized Datasets


Advancements in AI might soon render phrases such as "hidden in the crowd" or "stay hidden in plain sight" a curious relic of the past, according to new research published last week on Nature Communications. In a paper titled "Interaction data are identifiable even across long periods of time," researchers used geometric deep learning and triplet loss optimization to successfully identify a majority of individuals from an anonymized mobile phone dataset of 40,000 people. The research is notable because fine-grained records of people's interactions, both offline and online, are collected at scale today. Tech giants such as Facebook and Google, telecommunication operators, and other businesses are known to collect and either resell data wholesale or leverage it to power data-centric services. The technique relies on how people tend to stick to established social circles and that such regular interactions form a stable pattern over time.

Making smartphone data anonymous no longer enough: study


Privacy measures that are meant to preserve the anonymity of smartphone users are no longer suitable for the digital age, a study suggested on Tuesday. Vast quantities of data are scooped up from smartphone apps by firms looking to develop products, conduct research or target consumers with adverts. In Europe and many other jurisdictions, companies are legally bound to make this data anonymous, often doing so by removing telltale details like names or phone numbers. But the study in the Nature Communications journal says this is no longer enough to keep identities private. The researchers say people can now be identified with just a few details of how they communicate with an app like WhatsApp. One of the paper's authors, Yves-Alexandre de Montjoye of Imperial College London, told AFP it was time to "reinvent what anonymisation means".

Researchers Reveal That Anonymized Data Is Easy To Reverse Engineer


Merely existing in the modern world means giving up a wealth of your information to countless institutions and services. While many of the places make promises to keep your identifiable data as secure and private as possible, they can still--and oftentimes do--share anonymized versions of your data to third parties, whether that's for research or for profit. But new research indicates that even when data is stripped of any identifiable factors, it doesn't require a lot of mental gymnastics to piece together certain information and figure out, with pretty high confidence, who the "anonymous" user in the dataset is. In other words, anonymized data is not so anonymous. Researchers at Imperial College London published a paper in Nature Communications on Tuesday that explored how inadequate current techniques to anonymize datasets are.

Experts warn of privacy risk as US uses GPS to fight coronavirus spread

The Guardian

A transatlantic divide on how to use location data to fight coronavirus risks highlights the lack of safeguards for Americans' personal data, academics and data scientists have warned. The US Centers for Disease Control and Prevention (CDC) has turned to data provided by the mobile advertising industry to analyse population movements in the midst of the pandemic. Owing to a lack of systematic privacy protections in the US, data collected by advertising companies is often extremely detailed: companies with access to GPS location data, such as weather apps or some e-commerce sites, have been known to sell that data on for ad targeting purposes. That data provides much more granular information on the location and movement of individuals than the mobile network data received by the UK government from carriers including O2 and BT. While both datasets track individuals at the collection level, GPS data is accurate to within five metres, according to Yves-Alexandre de Montjoye, a data scientist at Imperial College, while mobile network data is accurate to 0.1km² in city centres and much less in less dense areas – the difference between locating an individual to their street and to a specific room in their home.

You're very easy to track down, even when your data has been anonymized


The data trail we leave behind us grows all the time. Most of it isn't that interesting--the takeout meal you ordered, that shower head you bought online--but some of it is deeply personal: your medical diagnoses, your sexual orientation, or your tax records. The most common way public agencies protect our identities is anonymization. This involves stripping out obviously identifiable things such as names, phone numbers, email addresses, and so on. Data sets are also altered to be less precise, columns in spreadsheets are removed, and "noise" is introduced to the data.