Goto

Collaborating Authors

 Xu, Tianning


Calibrate and Debias Layer-wise Sampling for Graph Convolutional Networks

arXiv.org Artificial Intelligence

Multiple sampling-based methods have been developed for approximating and accelerating node embedding aggregation in graph convolutional networks (GCNs) training. Among them, a layer-wise approach recursively performs importance sampling to select neighbors jointly for existing nodes in each layer. This paper revisits the approach from a matrix approximation perspective, and identifies two issues in the existing layer-wise sampling methods: suboptimal sampling probabilities and estimation biases induced by sampling without replacement. To address these issues, we accordingly propose two remedies: a new principle for constructing sampling probabilities and an efficient debiasing algorithm. The improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks.


On Variance Estimation of Random Forests with Infinite-Order U-statistics

arXiv.org Artificial Intelligence

Infinite-order U-statistics (IOUS) has been used extensively on subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation approaches and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large or the sample size is small. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, especially the ratio consistency, have never been studied. These limitations lead to unguaranteed performances of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator's finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and archive targeted coverage rates.