link prediction algorithm
Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape
Niu, Haoran, Barber, K. Suzanne
--It is difficult for individuals and organizations to protect personal information without a fundamental understanding of relative privacy risks. By analyzing over 5,000 empirical identity theft and fraud cases, this research identifies which types of personal data are exposed, how frequently exposures occur, and what the consequences of those exposures are. We construct an Identity Ecosystem graph--a foundational, graph-based model in which nodes represent personally identifiable information (PII) attributes and edges represent empirical disclosure relationships between them (e.g., the probability that one PII attribute is exposed due to the exposure of another). Leveraging this graph structure, we develop a privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised. The results show that our approach effectively answers the core question: Can the disclosure of a given identity attribute possibly lead to the disclosure of another attribute? Different individuals and organizations have different sets of personally identifiable information (PII), and therefore have different perspectives on which PII attributes are more vulnerable, more valuable, and in greater need of protection. An individual's PII includes personal data in four different categories--What you Know (e.g., name, address, phone number, mother's maiden name), What you Have (e.g., driver's license, Social Security Card, employee ID, passport), What you Are (e.g., fingerprint, voice, facial image), and What you Do (e.g., patterns of life such as websites visited, GPS locations visited, phone logs) [1]. Protecting PII data can be costly and time-consuming. Research has uncovered various strategies to reduce risks of unintended data disclosure [2], including statistical disclosure limitation (SDL) techniques commonly used by national statistical agencies before releasing public-use data sets.
New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs
Romero, Raphaรซl, De Bie, Tijl, Lijffijt, Jefrey
There is a fast-growing body of research on predicting future links in dynamic networks, with many new algorithms. Some benchmark data exists, and performance evaluations commonly rely on comparing the scores of observed network events (positives) with those of randomly generated ones (negatives). These evaluation measures depend on both the predictive ability of the model and, crucially, the type of negative samples used. Besides, as generally the case with temporal data, prediction quality may vary over time. This creates a complex evaluation space. In this work, we catalog the possibilities for negative sampling and introduce novel visualization methods that can yield insight into prediction performance and the dynamics of temporal networks. We leverage these visualization tools to investigate the effect of negative sampling on the predictive performance, at the node and edge level. We validate empirically, on datasets extracted from recent benchmarks that the error is typically not evenly distributed across different data segments. Finally, we argue that such visualization tools can serve as powerful guides to evaluate dynamic link prediction methods at different levels.
Using Adamic-Adar Index Algorithm to Predict Volunteer Collaboration: Less is More
Wu, Chao, Chen, Peng, Yin, Baiqiao, Lin, Zijuan, Jiang, Chen, Yu, Di, Zou, Changhong, Lui, Chunwang
Social networks exhibit a complex graph-like structure due to the uncertainty surrounding potential collaborations among participants. Machine learning algorithms possess generic outstanding performance in multiple real-world prediction tasks. However, whether machine learning algorithms outperform specific algorithms designed for graph link prediction remains unknown to us. To address this issue, the Adamic-Adar Index (AAI), Jaccard Coefficient (JC) and common neighbour centrality (CNC) as representatives of graph-specific algorithms were applied to predict potential collaborations, utilizing data from volunteer activities during the Covid-19 pandemic in Shenzhen city, along with the classical machine learning algorithms such as random forest, support vector machine, and gradient boosting as single predictors and components of ensemble learning. This paper introduces that the AAI algorithm outperformed the traditional JC and CNC, and other machine learning algorithms in analyzing graph node attributes for this task.
Stacking Models for Nearly Optimal Link Prediction in Complex Networks
Ghasemian, Amir, Hosseinmardi, Homa, Galstyan, Aram, Airoldi, Edoardo M., Clauset, Aaron
Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. We answer these questions by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of 548 structurally diverse networks from six scientific domains. We first show that individual algorithms exhibit a broad diversity of prediction errors, such that no one predictor or family is best, or worst, across all realistic inputs. We then exploit this diversity via meta-learning to construct a series of "stacked" models that combine predictors into a single algorithm. Applied to a broad range of synthetic networks, for which we may analytically calculate optimal performance, these stacked models achieve optimal or nearly optimal levels of accuracy. Applied to real-world networks, stacked models are also superior, but their accuracy varies strongly by domain, suggesting that link prediction may be fundamentally easier in social networks than in biological or technological networks. These results indicate that the state-of-the-art for link prediction comes from combining individual algorithms, which achieves nearly optimal predictions. We close with a brief discussion of limitations and opportunities for further improvement of these results.
Strategic Social Network Analysis
Michalak, Tomasz P. (University of Warsaw and University of Oxford) | Rahwan, Talal (Masdar Institute of Science and Technology) | Wooldridge, Michael
How can individuals and communities protect their privacy against social network analysis tools? How do criminals or terrorists organizations evade detection by such tools? Under which conditions can these tools be made strategy proof? These fundamental questions have attracted little attention in the literature to date, as most social network analysis tools are built around the assumption that individuals or groups in a network do not act strategically to evade such tools. With this in mind, we outline in this paper a new paradigm for social network analysis, whereby the strategic behaviour of network actors is explicitly modeled. Addressing this research challenge has various implications. For instance, it may allow two individuals to keep their relationship secret or private. It may also allow members of an activist group to conceal their membership, or even conceal the existence of their group from authoritarian regimes. Furthermore, it may assist security agencies and counter terrorism units in understanding the strategies that covert organizations use to escape detection, and give rise to new strategy-proof countermeasures.