bcff3f632fd16ff099a49c2f0932b47a-AuthorFeedback.pdf
–Neural Information Processing Systems
GAT 83.00 72.50 77.26 BAM(REMOVEKL) 83.39 72.91 78.50 BAM-WC 83.81 73.52 78.82 training, which is the reason that it can be used at the test time to help predict30 the outputs. We introduce a contextual prior distribution to impose further31 regularization on the attention distributions. We agree if setting the prior and32 variational posterior the same, the KL in ELBO vanishes and regularization33 disappears. Also in BAM, the attention weights are data dependent local variables. This approach is more computa-41 tionally efficient compared to the convention in Bayesian neural network where neural network parameters,42 such as θ, are modeled as globally shared random variables (i.e., not data dependent), as the latter approach43 TableS3: PAvPUforgraphs.
Neural Information Processing Systems
Feb-10-2026, 03:00:10 GMT
- Technology: