A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs
Wang, Yun, Kosyfaki, Chrysanthi, Amer-Yahia, Sihem, Cheng, Reynold
Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing in graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses in attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m- dimensional random walk that accounts for the paths specified in a hypothesis. We further optimize its time efficiency and propose PHASEopt. Experiments on real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling in terms of accuracy and time efficiency.
Mar-19-2024
- Country:
- Oceania > Australia
- North America
- United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- North Carolina > Wake County
- Raleigh (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Colorado > Denver County
- Denver (0.04)
- California > San Diego County
- San Diego (0.04)
- Pennsylvania > Philadelphia County
- Canada
- Quebec > Montreal (0.04)
- Ontario > Waterloo Region
- Waterloo (0.04)
- United States
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Italy > Tuscany
- Pisa Province > Pisa (0.04)
- France > Auvergne-Rhône-Alpes
- United Kingdom > England
- Asia
- China > Hong Kong (0.04)
- Macao (0.04)
- Middle East > Iran
- Tehran Province > Tehran (0.04)
- Genre:
- Research Report > Experimental Study (0.47)
- Industry:
- Health & Medicine (0.93)
- Technology: