Well File:

 Karsten Borgwardt


Wasserstein Weisfeiler-Lehman Graph Kernels

Neural Information Processing Systems

Most graph kernels are an instance of the class of R-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Furthermore, only a limited instance of these approaches can be extended to continuously attributed graphs. We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows finding subtler differences in data sets by considering graphs as high-dimensional objects rather than simple means. We further propose a Weisfeiler-Lehman-inspired embedding scheme for graphs with continuous node attributes and weighted edges, enhance it with the computed Wasserstein distance, and thereby improve the state-of-the-art prediction performance on several graph classification tasks.


Wasserstein Weisfeiler-Lehman Graph Kernels

Neural Information Processing Systems

Most graph kernels are an instance of the class of R-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Furthermore, only a limited instance of these approaches can be extended to continuously attributed graphs. We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows finding subtler differences in data sets by considering graphs as high-dimensional objects rather than simple means. We further propose a Weisfeiler-Lehman-inspired embedding scheme for graphs with continuous node attributes and weighted edges, enhance it with the computed Wasserstein distance, and thereby improve the state-of-the-art prediction performance on several graph classification tasks.


Finding significant combinations of features in the presence of categorical covariates

Neural Information Processing Systems

In high-dimensional settings, where the number of features p is much larger than the number of samples n, methods that systematically examine arbitrary combinations of features have only recently begun to be explored. However, none of the current methods is able to assess the association between feature combinations and a target variable while conditioning on a categorical covariate. As a result, many false discoveries might occur due to unaccounted confounding effects. We propose the Fast Automatic Conditional Search (FACS) algorithm, a significant discriminative itemset mining method which conditions on categorical covariates and only scales as O(k log k), where k is the number of states of the categorical covariate. Based on the Cochran-Mantel-Haenszel Test, FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.