Kleinsteuber, Martin
Barlow Graph Auto-Encoder for Unsupervised Network Embedding
Khan, Rayyan Ahmad, Kleinsteuber, Martin
Network embedding has emerged as a promising research field for network analysis. Recently, an approach, named Barlow Twins, has been proposed for self-supervised learning in computer vision by applying the redundancy-reduction principle to the embedding vectors corresponding to two distorted versions of the image samples. Motivated by this, we propose Barlow Graph Auto-Encoder, a simple yet effective architecture for learning network embedding. It aims to maximize the similarity between the embedding vectors of immediate and larger neighborhoods of a node, while minimizing the redundancy between the components of these projections. In addition, we also present the variation counterpart named as Barlow Variational Graph Auto-Encoder. Our approach yields promising results for inductive link prediction and is also on par with state of the art for clustering and downstream node classification, as demonstrated by extensive comparisons with several well-known techniques on three benchmark citation datasets.
Metapath- and Entity-aware Graph Neural Network for Recommendation
Han, Zhiwei, Anwaar, Muhammad Umer, Arumugaswamy, Shyam, Weber, Thomas, Qiu, Tianming, Shen, Hao, Liu, Yuanting, Kleinsteuber, Martin
Due to the shallow structure, classic graph neural networks (GNNs) failed in modelling high-order graph structures that deliver critical insights of task relevant relations. The negligence of those insights lead to insufficient distillation of collaborative signals in recommender systems. In this paper, we propose PEAGNN, a unified GNN framework tailored for recommendation tasks, which is capable of exploiting the rich semantics in metapaths. PEAGNN trains multilayer GNNs to perform metapath-aware information aggregation on collaborative subgraphs, $h$-hop subgraphs around the target user-item pairs. After the attentive fusion of aggregated information from different metapaths, a graph-level representation is then extracted for matching score prediction. To leverage the local structure of collaborative subgraphs, we present entity-awareness that regularizes node embedding with the presence of features in a contrastive manner. Moreover, PEAGNN is compatible with the mainstream GNN structures such as GCN, GAT and GraphSage. The empirical analysis on three public datasets demonstrate that our model outperforms or is at least on par with other competitive baselines. Further analysis indicates that trained PEAGNN automatically derives meaningful metapath combinations from the given metapaths.
Extended Affinity Propagation: Global Discovery and Local Insights
Khan, Rayyan Ahmad, Amjad, Rana Ali, Kleinsteuber, Martin
We propose a new clustering algorithm, Extended Affinity Propagation, based on pairwise similarities. Extended Affinity Propagation is developed by modifying Affinity Propagation such that the desirable features of Affinity Propagation, e.g., exemplars, reasonable computational complexity and no need to specify number of clusters, are preserved while the shortcomings, e.g., the lack of global structure discovery, that limit the applicability of Affinity Propagation are overcome. Extended Affinity Propagation succeeds not only in achieving this goal but can also provide various additional insights into the internal structure of the individual clusters, e.g., refined confidence values, relative cluster densities and local cluster strength in different regions of a cluster, which are valuable for an analyst. We briefly discuss how these insights can help in easily tuning the hyperparameters. We also illustrate these desirable features and the performance of Extended Affinity Propagation on various synthetic and real world datasets.
Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking
Kasai, Hiroyuki, Kellerer, Wolfgang, Kleinsteuber, Martin
This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms.
Learning Co-Sparse Analysis Operators with Separable Structures
Seibert, Matthias, Wörmann, Julian, Gribonval, Rémi, Kleinsteuber, Martin
In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operators for a given class is therefore a crucial problem. In many applications, it is also required that the filter responses are obtained in a timely manner, which can be achieved by filters with a separable structure. Not only can operators of this sort be efficiently used for computing the filter responses, but they also have the advantage that less training samples are required to obtain a reliable estimate of the operator. The first contribution of this work is to give theoretical evidence for this claim by providing an upper bound for the sample complexity of the learning process. The second is a stochastic gradient descent (SGD) method designed to learn an analysis operator with separable structures, which includes a novel and efficient step size selection rule. Numerical experiments are provided that link the sample complexity to the convergence speed of the SGD algorithm.
Robust Structured Low-Rank Approximation on the Grassmannian
Hage, Clemens, Kleinsteuber, Martin
Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control the rank of a structured approximation via matrix factorization approaches. The drawbacks of these methods either lie in the lack of robustness against outliers or in their static nature of repeated batch-processing. We present a Robust Structured Low-Rank Approximation method on the Grassmannian that on the one hand allows for fast re-initialization in an online setting due to subspace identification with manifolds, and that is robust against outliers due to a smooth approximation of the $\ell_p$-norm cost function on the other hand. The method is evaluated in online time series forecasting tasks on simulated and real-world data.
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Gribonval, Rémi, Jenatton, Rodolphe, Bach, Francis, Kleinsteuber, Martin, Seibert, Matthias
Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques.
Separable Cosparse Analysis Operator Learning
Seibert, Matthias, Wörmann, Julian, Gribonval, Rémi, Kleinsteuber, Martin
The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms use vectorized signals and thereby do not account for this underlying structure. The drawback of not taking the inherent structure into account is a dramatic increase in computational cost. We propose an algorithm for learning a cosparse Analysis Operator that adheres to the preexisting structure of the data, and thus allows for a very efficient implementation. This is achieved by enforcing a separable structure on the learned operator. Our learning algorithm is able to deal with multidimensional data of arbitrary order. We evaluate our method on volumetric data at the example of three-dimensional MRI scans.
On The Sample Complexity of Sparse Dictionary Learning
Seibert, Matthias, Kleinsteuber, Martin, Gribonval, Rémi, Jenatton, Rodolphe, Bach, Francis
In the synthesis model signals are represented as a sparse combinations of atoms from a dictionary. Dictionary learning describes the acquisition process of the underlying dictionary for a given set of training samples. While ideally this would be achieved by optimizing the expectation of the factors over the underlying distribution of the training data, in practice the necessary information about the distribution is not available. Therefore, in real world applications it is achieved by minimizing an empirical average over the available samples. The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function. This estimate then provides a suitable estimate to the accuracy of the representation of the learned dictionary. The presented approach exemplifies the general results proposed by the authors in Sample Complexity of Dictionary Learning and other Matrix Factorizations, Gribonval et al. and gives more concrete bounds of the sample complexity of dictionary learning. We cover a variety of sparsity measures employed in the learning procedure.
Separable Dictionary Learning
Hawe, Simon, Seibert, Matthias, Kleinsteuber, Martin
Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary properties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD.