In keeping up with the company's newfound image as a proponent of people's privacy, Federighi first pointed out that Apple does not build user profiles. He briefly mentioned end-to-end encryption before alluding to the privacy challenges of big data analysis, which is essentially the key to improving features and product experiences for most any tech company. The quick build up led to the announcement of a solution: "differential privacy." Against the backdrop of a major keynote address, unfamiliar techniques tend to sound new and revolutionary. But differential privacy is a mathematical technique that's been around for a few years within the statistical field.
Privacy is a term used to describe an individual's anonymity and how safe they feel in a location preferably in Internet, which is one of the most sensitive and concerned'concept' at present. In the current situation crowd sourcing is the most popular source of collecting data directly from people for many research topics. Generally it is being done through several online site or portal in Internet. But there are some basic issues regarding the whole survey process, like (a) the system of survey should be convincing enough to gain the participants trust,(b) the processes after the survey should be effective enough to ensure the'truthfulness' of the participants to the researchers, (c) The processes of research should be robust enough to guarantee the leak proof of the research architecture model, (d) The system of survey should still produce a'good result' in terms of gaining an insight of the problem in spite of the'noise' in the data. Therefore it promises a large research field in case of statistical databases where a leak of small amount of data may lead to a personal identification which might be a concern for that person in his or her personal life .
Data can help businesses, organizations and societies solve difficult problems, but some of the most useful data contains personal information that can't be used without compromising privacy. That's why Microsoft Research and collaborators developed differential privacy, which safeguards the privacy of individuals while making useful data available for research and decision making. Today, I am excited to share some of what we've learned over the years and what we're working toward, as well as to announce a new name for our open source platform for differential privacy – a major part of our commitment to collaborate around this important topic. Differential privacy consists of two components: statistical noise and a privacy-loss budget. Statistical noise masks the contribution of individual data points within a dataset but does not impact the overall accuracy of the dataset, while a privacy-loss budget keeps track of how much information has been revealed through various queries to ensure that aggregate queries don't inadvertently reveal private information.
As the second installment in this series of posts, I will touch upon on the topic of privacy in data science and algorithms. In particular, I'm going to discuss a relatively novel concept of privacy called differential privacy that promises, similar to algorithmic fairness, a way of quantifying the privacy of AI algorithms. When we, as humans, talk about privacy, we mostly refer to a desire to not be observed by others. However, what does privacy mean in the context of algorithms that "observe" us by using data that has information on us? In a very general sense, we could say that privacy will be preserved if, after analysis, the algorithm that used our data (e.g. an application on our smartphones) doesn't know anything about us.
We investigate the problem of nodes clustering under privacy constraints when representing a dataset as a graph. Our contribution is threefold. First we formally define the concept of differential privacy for structured databases such as graphs, and give an alternative definition based on a new neighborhood notion between graphs. This definition is adapted to particular frameworks that can be met in various application fields such as genomics, world wide web, population survey, etc. Second, we introduce a new algorithm to tackle the issue of privately releasing an approximated minimum spanning tree topology for a simple-undirected-weighted graph. It provides a simple way of producing the topology of a private almost minimum spanning tree which outperforms, in most cases, the state of the art "Laplace mechanism" in terms of weight-approximation error. Finally, we propose a theoretically motivated method combining a sanitizing mechanism (such as Laplace or our new algorithm) with a Minimum Spanning Tree (MST)-based clustering algorithm. It provides an accurate method for nodes clustering in a graph while keeping the sensitive information contained in the edges weights of the private graph. We provide some theoretical results on the robustness of an almost minimum spanning tree construction for Laplace sanitizing mechanisms. These results exhibit which conditions the graph weights should respect in order to consider that the nodes form well separated clusters both for Laplace and our algorithm as sanitizing mechanism. The method has been experimentally evaluated on simulated data, and preliminary results show the good behavior of the algorithm while identifying well separated clusters.