A data-centric approach for assessing progress of Graph Neural Networks
Zhao, Tianqi, Dong, Ngan Thi, Hanjalic, Alan, Khosla, Megha
–arXiv.org Artificial Intelligence
Graph Neural Networks (GNNs) have achieved state-of-the-art results in node classification tasks. However, most improvements are in multi-class classification, with less focus on the cases where each node could have multiple labels. The first challenge in studying multi-label node classification is the scarcity of publicly available datasets. To address this, we collected and released three real-world biological datasets and developed a multi-label graph generator with tunable properties. We also argue that traditional notions of homophily and heterophily do not apply well to multi-label scenarios. Therefore, we define homophily and Cross-Class Neighborhood Similarity for multi-label classification and investigate $9$ collected multi-label datasets. Lastly, we conducted a large-scale comparative study with $8$ methods across nine datasets to evaluate current progress in multi-label node classification. We release our code at \url{https://github.com/Tianqi-py/MLGNC}.
arXiv.org Artificial Intelligence
Jun-18-2024
- Country:
- Europe
- Germany > Lower Saxony
- Hanover (0.04)
- Netherlands > South Holland
- Delft (0.06)
- Germany > Lower Saxony
- North America > United States
- New York > New York County > New York City (0.04)
- Europe
- Genre:
- Research Report (0.51)
- Technology: