network dataset
Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference
Belkhiter, Yannis, Tirupathi, Seshu, Zizzo, Giulio, Dzaferagic, Merim, Kelleher, John D.
As AI becomes a native component of 6G network control, AI models must adapt to continuously changing conditions, including the introduction of new features and measurements driven by multi-vendor deployments, hardware upgrades, and evolving service requirements. To address this growing need for flexible learning in non-stationary environments, this vision paper highlights Adaptive Random Forests (ARFs) as a reliable solution for dynamic feature adaptation in communication network scenarios. We show that iterative training of ARFs can effectively lead to stable predictions, with accuracy improving over time as more features are added. In addition, we highlight the importance of explainability in AI-driven networks, proposing Drift-Aware Feature Importance (DAFI) as an efficient XAI feature importance (FI) method. DAFI uses a distributional drift detector to signal when to apply computationally intensive FI methods instead of lighter alternatives. Our tests on 3 different datasets indicate that our approach reduces runtime by up to 2 times, while producing more consistent feature importance values. Together, ARFs and DAFI provide a promising framework to build flexible AI methods adapted to 6G network use-cases.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
- Oceania > Australia > New South Wales (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > Wales (0.04)
- Telecommunications (1.00)
- Information Technology > Security & Privacy (1.00)
Causal invariant geographic network representations with feature and structural distribution shifts
Wang, Yuhan, He, Silu, Luo, Qinyao, Yuan, Hongyuan, Zhao, Ling, Zhu, Jiawei, Li, Haifeng
The existing methods learn geographic network representations through deep graph neural networks (GNNs) based on the i.i.d. assumption. However, the spatial heterogeneity and temporal dynamics of geographic data make the out-of-distribution (OOD) generalisation problem particularly salient. The latter are particularly sensitive to distribution shifts (feature and structural shifts) between testing and training data and are the main causes of the OOD generalisation problem. Spurious correlations are present between invariant and background representations due to selection biases and environmental effects, resulting in the model extremes being more likely to learn background representations. The existing approaches focus on background representation changes that are determined by shifts in the feature distributions of nodes in the training and test data while ignoring changes in the proportional distributions of heterogeneous and homogeneous neighbour nodes, which we refer to as structural distribution shifts. We propose a feature-structure mixed invariant representation learning (FSM-IRL) model that accounts for both feature distribution shifts and structural distribution shifts. To address structural distribution shifts, we introduce a sampling method based on causal attention, encouraging the model to identify nodes possessing strong causal relationships with labels or nodes that are more similar to the target node. Inspired by the Hilbert-Schmidt independence criterion, we implement a reweighting strategy to maximise the orthogonality of the node representations, thereby mitigating the spurious correlations among the node representations and suppressing the learning of background representations. Our experiments demonstrate that FSM-IRL exhibits strong learning capabilities on both geographic and social network datasets in OOD scenarios.
- Asia > China > Hunan Province (0.14)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology (0.49)
- Education (0.46)
Differential Privacy on Trust Graphs
Ghazi, Badih, Kumar, Ravi, Manurangsi, Pasin, Wang, Serena
We study differential privacy (DP) in a multi-party setting where each party only trusts a (known) subset of the other parties with its data. Specifically, given a trust graph where vertices correspond to parties and neighbors are mutually trusting, we give a DP algorithm for aggregation with a much better privacy-utility trade-off than in the well-studied local model of DP (where each party trusts no other party). We further study a robust variant where each party trusts all but an unknown subset of at most $t$ of its neighbors (where $t$ is a given parameter), and give an algorithm for this setting. We complement our algorithms with lower bounds, and discuss implications of our work to other tasks in private learning and analytics.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea (0.04)
- Health & Medicine (0.68)
- Information Technology > Security & Privacy (0.68)
Self-supervision meets kernel graph neural models: From architecture to augmentations
Dan, Jiawang, Wu, Ruofan, Liu, Yunpeng, Wang, Baokun, Meng, Changhua, Liu, Tengfei, Zhang, Tianyi, Wang, Ningtao, Fu, Xing, Li, Qi, Wang, Weiqiang
Graph representation learning has now become the de facto standard when handling graph-structured data, with the framework of message-passing graph neural networks (MPNN) being the most prevailing algorithmic tool. Despite its popularity, the family of MPNNs suffers from several drawbacks such as transparency and expressivity. Recently, the idea of designing neural models on graphs using the theory of graph kernels has emerged as a more transparent as well as sometimes more expressive alternative to MPNNs known as kernel graph neural networks (KGNNs). Developments on KGNNs are currently a nascent field of research, leaving several challenges from algorithmic design and adaptation to other learning paradigms such as self-supervised learning. In this paper, we improve the design and learning of KGNNs. Firstly, we extend the algorithmic formulation of KGNNs by allowing a more flexible graph-level similarity definition that encompasses former proposals like random walk graph kernel, as well as providing a smoother optimization objective that alleviates the need of introducing combinatorial learning procedures. Secondly, we enhance KGNNs through the lens of self-supervision via developing a novel structure-preserving graph data augmentation method called latent graph augmentation (LGA). Finally, we perform extensive empirical evaluations to demonstrate the efficacy of our proposed mechanisms. Experimental results over benchmark datasets suggest that our proposed model achieves competitive performance that is comparable to or sometimes outperforming state-of-the-art graph representation learning frameworks with or without self-supervision on graph classification tasks. Comparisons against other previously established graph data augmentation methods verify that the proposed LGA augmentation scheme captures better semantics of graph-level invariance.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Hyperbolic Random Forests
Doorenbos, Lars, Márquez-Neila, Pablo, Sznitman, Raphael, Mettes, Pascal
Hyperbolic space is becoming a popular choice for representing data due to the hierarchical structure - whether implicit or explicit - of many real-world datasets. Along with it comes a need for algorithms capable of solving fundamental tasks, such as classification, in hyperbolic space. Recently, multiple papers have investigated hyperbolic alternatives to hyperplane-based classifiers, such as logistic regression and SVMs. While effective, these approaches struggle with more complex hierarchical data. We, therefore, propose to generalize the well-known random forests to hyperbolic space. We do this by redefining the notion of a split using horospheres. Since finding the globally optimal split is computationally intractable, we find candidate horospheres through a large-margin classifier. To make hyperbolic random forests work on multi-class data and imbalanced experiments, we furthermore outline a new method for combining classes based on their lowest common ancestor and a class-balanced version of the large-margin loss. Experiments on standard and new benchmarks show that our approach outperforms both conventional random forest algorithms and recent hyperbolic classifiers.
- North America > United States (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Greece > Attica > Athens (0.04)
Subgraph Detection Using Eigenvector L1 Norms
When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a detection theory" for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph's so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach."
DPPIN: A Biological Repository of Dynamic Protein-Protein Interaction Network Data
In the big data era, the relationship between entries becomes more and more complex. Many graph (or network) algorithms have already paid attention to dynamic networks, which are more suitable than static ones for fitting the complex real-world scenarios with evolving structures and features. To contribute to the dynamic network representation learning and mining research, we provide a new bunch of label-adequate, dynamics-meaningful, and attribute-sufficient dynamic networks from the health domain. To be specific, in our proposed repository DPPIN, we totally have 12 individual dynamic network datasets at different scales, and each dataset is a dynamic protein-protein interaction network describing protein-level interactions of yeast cells. We hope these domain-specific node features, structure evolution patterns, and node and graph labels could inspire the regularization techniques to increase the performance of graph machine learning algorithms in a more complex setting. Also, we link potential applications with our DPPIN by designing various dynamic graph experiments, where DPPIN could indicate future research opportunities for some tasks by presenting challenges on state-of-the-art baseline algorithms. Finally, we identify future directions to improve the utility of this repository and welcome constructive inputs from the community. All resources (e.g., data and code) of this work are deployed and publicly available at https://github.com/DongqiFu/DPPIN.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
Hypergraph convolutional neural network-based clustering technique
Tran, Loc H., Trinh, Nguyen, Tran, Linh H.
This paper constitutes the novel hypergraph convolutional neural networkbased clustering technique. This technique is employed to solve the clustering problem for the Citeseer dataset and the Cora dataset. Each dataset contains the feature matrix and the incidence matrix of the hypergraph (i.e., constructed from the feature matrix). This novel clustering method utilizes both matrices. Initially, the hypergraph auto-encoders are employed to transform both the incidence matrix and the feature matrix from high dimensional space to low dimensional space. In the end, we apply the k-means clustering technique to the transformed matrix. The hypergraph convolutional neural network (CNN)-based clustering technique presented a better result on performance during experiments than those of the other classical clustering techniques.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.06)
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- (2 more...)
Unsupervised Ensemble Based Deep Learning Approach for Attack Detection in IoT Network
Ahmed, Mir Shahnawaz, Shah, Shahid Mehraj
The Internet of Things (IoT) has altered living by controlling devices/things over the Internet. IoT has specified many smart solutions for daily problems, transforming cyber-physical systems (CPS) and other classical fields into smart regions. Most of the edge devices that make up the Internet of Things have very minimal processing power. To bring down the IoT network, attackers can utilise these devices to conduct a variety of network attacks. In addition, as more and more IoT devices are added, the potential for new and unknown threats grows exponentially. For this reason, an intelligent security framework for IoT networks must be developed that can identify such threats. In this paper, we have developed an unsupervised ensemble learning model that is able to detect new or unknown attacks in an IoT network from an unlabelled dataset. The system-generated labelled dataset is used to train a deep learning model to detect IoT network attacks. Additionally, the research presents a feature selection mechanism for identifying the most relevant aspects in the dataset for detecting attacks. The study shows that the suggested model is able to identify the unlabelled IoT network datasets and DBN (Deep Belief Network) outperform the other models with a detection accuracy of 97.5% and a false alarm rate of 2.3% when trained using labelled dataset supplied by the proposed approach.
- Information Technology > Internet of Things (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Improving the Reliability of Network Intrusion Detection Systems through Dataset Integration
Magán-Carrión, Roberto, Urda, Daniel, Díaz-Cano, Ignacio, Dorronsoro, Bernabé
This work presents Reliable-NIDS (R-NIDS), a novel methodology for Machine Learning (ML) based Network Intrusion Detection Systems (NIDSs) that allows ML models to work on integrated datasets, empowering the learning process with diverse information from different datasets. Therefore, R-NIDS targets the design of more robust models, that generalize better than traditional approaches. We also propose a new dataset, called UNK21. It is built from three of the most well-known network datasets (UGR'16, USNW-NB15 and NLS-KDD), each one gathered from its own network environment, with different features and classes, by using a data aggregation approach present in R-NIDS. Following R-NIDS, in this work we propose to build two well-known ML models (a linear and a non-linear one) based on the information of three of the most common datasets in the literature for NIDS evaluation, those integrated in UNK21. The results that the proposed methodology offers show how these two ML models trained as a NIDS solution could benefit from this approach, being able to generalize better when training on the newly proposed UNK21 dataset. Furthermore, these results are carefully analyzed with statistical tools that provide high confidence on our conclusions.
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Europe > Spain > Castile and León > Burgos Province > Burgos (0.04)
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
- (3 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)