Edge-aware baselines for ogbn-proteins in PyTorch Geometric: species-wise normalization, post-hoc calibration, and cost-accuracy trade-offs
Stanković, Aleksandar, Lisica, Dejan
–arXiv.org Artificial Intelligence
We present reproducible, edge-aware baselines for ogbn-proteins in PyTorch Geometric (PyG). We study two system choices that dominate practice: (i) how 8-dimensional edge evidence is aggregated into node inputs, and (ii) how edges are used inside message passing. Our strongest baseline is GraphSAGE with sum-based edge-to-node features. We compare LayerNorm (LN), BatchNorm (BN), and a species-aware Conditional LayerNorm (CLN), and report compute cost (time, VRAM, parameters) together with accuracy (ROC-AUC) and decision quality. In our primary experimental setup (hidden size 512, 3 layers, 3 seeds), sum consistently beats mean and max; BN attains the best AUC, while CLN matches the AUC frontier with better thresholded F1. Finally, post-hoc per-label temperature scaling plus per-label thresholds substantially improves micro-F1 and expected calibration error (ECE) with negligible AUC change, and light label-correlation smoothing yields small additional gains. We release standardized artifacts and scripts used for all of the runs presented in the paper.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.05)
- Genre:
- Research Report (0.40)
- Technology: