superiority
- North America > United States > Wisconsin (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
When Do Graph Neural Networks Help with Node Classification? Investigating the Homophily Principle on Node Distinguishability
Homophily principle, i.e., nodes with the same labels are more likely to be connected, has been believed to be the main reason for the performance superiority of Graph Neural Networks (GNNs) over Neural Networks on node classification tasks. Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighborhood patterns. However, this argument only considers intra-class Node Distinguishability (ND) but neglects inter-class ND, which provides incomplete understanding of homophily on GNNs. In this paper, we first demonstrate such deficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and study ND deeply, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and negative generalized Jeffreys divergence, to quantify ND.
Lookaround Optimizer: k steps around, 1 step average
Weight Average (WA) is an active research topic due to its simplicity in ensembling deep networks and the effectiveness in promoting generalization. Existing weight average approaches, however, are often carried out along only one training trajectory in a post-hoc manner (i.e., the weights are averaged after the entire training process is finished), which significantly degrades the diversity between networks and thus impairs the effectiveness. In this paper, inspired by weight average, we propose Lookaround, a straightforward yet effective SGD-based optimizer leading to flatter minima with better generalization.
Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective
Graph neural networks (GNNs) have shown superiority in many prediction tasks over graphs due to their impressive capability of capturing nonlinear relations in graph-structured data. However, for node classification tasks, often, only marginal improvement of GNNs has been observed in practice over their linear counterparts. Previous works provide very few understandings of this phenomenon. In this work, we resort to Bayesian learning to give an in-depth investigation of the functions of non-linearity in GNNs for node classification tasks. Given a graph generated from the statistical model CSBM, we observe that the max-a-posterior estimation of a node label given its own and neighbors' attributes consists of two types of non-linearity, the transformation of node attributes and a ReLU-activated feature aggregation from neighbors. The latter surprisingly matches the type of non-linearity used in many GNN models. By further imposing Gaussian assumption on node attributes, we prove that the superiority of those ReLU activations is only significant when the node attributes are far more informative than the graph structure, which nicely explains previous empirical observations. A similar argument is derived when there is a distribution shift of node attributes between the training and testing datasets. Finally, we verify our theory on both synthetic and real-world networks.
'Fear really drives him': is Alex Karp of Palantir the world's scariest CEO?
'Palantir is the embodiment, in a lot of ways, of him' Alex Karp. 'Palantir is the embodiment, in a lot of ways, of him' Alex Karp. 'Fear really drives him': is Alex Karp of Palantir the world's scariest CEO? His company is potentially creating the ultimate state surveillance tool, and Karp has recently been on a striking political and philosophical journey. I n a recent interview, Alex Karp said that his company Palantir was "the most important software company in America and therefore in the world". He may well be right.
- Europe > United Kingdom (0.29)
- North America > United States > California (0.05)
- Europe > Ukraine (0.05)
- (13 more...)
- Leisure & Entertainment > Sports (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- (7 more...)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Banking & Finance (0.67)
- Information Technology (0.46)
26b58a41da329e0cbde0cbf956640a58-AuthorFeedback.pdf
We thank all the reviewers for their constructive feedback. Our revision will incorporate all the points detailed below. The algorithms in [1] are not for composition optimization but for general saddle-point problems. Fig-(a) below) shows the superiority of our algorithms over [1] ( Saddle-SVRG). R#2: Compare to other composition optimization methods.
15212f24321aa2c3dc8e9acf820f3c15-AuthorFeedback.pdf
We would like to thank all the reviewers for their insightful comments. Changes mentioned in our responses below have been incorporated in the revised version of the paper. Regarding the contribution of the paper, our Level-1 theory of mind (section 2.2) was similar to Ref [23] That is not true for the opposite case. POMDP model always generates a deterministic policy. It only changes the likelihood function of the model. Therefore, we don't need any new parameters to measure the accuracy of our model.