Eldardiry, Hoda


Inductive Representation Learning in Large Attributed Graphs

arXiv.org Machine Learning

Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs. To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\Phi : \mathrm{\rm \bf x} \rightarrow w$ that maps a node attribute vector $\mathrm{\rm \bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks.


Similarity-based Multi-label Learning

arXiv.org Machine Learning

Multi-label classification is an important learning problem with many applications. In this work, we propose a principled similarity-based approach for multi-label learning called SML. We also introduce a similarity-based approach for predicting the label set size. The experimental results demonstrate the effectiveness of SML for multi-label classification where it is shown to compare favorably with a wide variety of existing algorithms across a range of evaluation criterion.


Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence

AI Magazine

Reports of the Workshops of the Thirty-First AAAI Conference on Artificial Intelligence


A Framework for Generalizing Graph-based Representation Learning Methods

arXiv.org Machine Learning

Random walks are at the heart of many existing deep learning algorithms for graph data. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to node identity. In this work, we introduce the notion of attributed random walks which serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16.1% while requiring on average 853 times less space than existing methods on a variety of graphs from several domains.


Ensemble Classification for Relational Domains

AAAI Conferences

Ensemble classification methods have been shown to produce more accurate predictions than the base component models. Due to their effectiveness, ensemble approaches have been applied in a wide range of domains to improve classification. The expected prediction error of classification models can be decomposed into bias and variance. Ensemble methods that independently construct component models (e.g., bagging) can improve performance by reducing the error due to variance, while methods that dependently construct component models (e.g., boosting) can improve performance by reducing the error due to bias and variance. Although ensemble methods were initially developed for classification of independent and identically distributed (i.i.d.) data, they can be directly applied for relational data by using a relational classifier as the base component model. This straightforward approach can improve classification for network data, but suffers from a number of limitations. First, relational data characteristics will only be exploited by the base relational classifier, and not by the ensemble algorithm itself. We note that explicitly accounting for the structured nature of relational data by the ensemble mechanism can significantly improve ensemble classification. Second, ensemble learning methods that assume i.i.d. data can fail to preserve the relational structure of non-i.i.d. data, which will (1) prevent the relational base classifiers from exploiting these structures, and (2) fail to accurately capture properties of the dataset, which can lead to inaccurate models and classifications. Third, ensemble mechanisms that assume i.i.d. data are limited to reducing errors associated with i.i.d. models and fail to reduce additional sources of error associated with more powerful (e.g., collective classification models. Our key observation is that collective classification methods have error due to variance in inference. This has been overlooked by current ensemble methods that assume exact inference methods and only focus on the typical goal of reducing errors due to learning, even if the methods explicitly consider relational data. Here we study the problem of ensemble classification for relational domains by focusing on the reduction of error due to variance. We propose a relational ensemble framework that explicitly accounts for the structured nature of relational data during both learning and inference. Our proposed framework consists of two components. (1) A method for learning accurate ensembles from relational data, focusing on the reduction of error due to variance in learning, while preserving the relational characteristics in the data. (2) A method for applying ensembles in collective classification contexts, focusing on further reduction of the error due to variance in inference, which has not been considered in state of the art ensemble methods.


Across-Model Collective Ensemble Classification

AAAI Conferences

Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approaches follow the conventional approach to ensemble learning, they improve performance by reducing the error due to variance in learning. We note however, that variance in inference can be an additional source of error in relational methods that use collective classification, since inferred values are propagated during inference. We propose a novel ensemble mechanism for collective classification that reduces  both learning and inference variance, by incorporating prediction averaging into the collective inference process itself. We show that our proposed method significantly outperforms a straightforward relational ensemble baseline on both synthetic and real-world datasets.