ugr
Quality In / Quality Out: Assessing Data quality in an Anomaly Detection Benchmark
Camacho, José, Wasielewska, Katarzyna, Fuentes-García, Marta, Rodríguez-Gómez, Rafael
Autonomous or self-driving networks are expected to provide a solution to the myriad of extremely demanding new applications in the Future Internet. The key to handle complexity is to perform tasks like network optimization and failure recovery with minimal human supervision. For this purpose, the community relies on the development of new Machine Learning (ML) models and techniques. However, ML can only be as good as the data it is fitted with. Datasets provided to the community as benchmarks for research purposes, which have a relevant impact in research findings and directions, are often assumed to be of good quality by default. In this paper, we show that relatively minor modifications on the same benchmark dataset (UGR'16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific ML technique considered. To understand this finding, we contribute a methodology to investigate the root causes for those differences, and to assess the quality of the data labelling. Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.
- Europe > Spain > Andalusia > Granada Province > Granada (0.05)
- Asia > Macao (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.48)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Leveraging a Probabilistic PCA Model to Understand the Multivariate Statistical Network Monitoring Framework for Network Security Anomaly Detection
Pérez-Bueno, Fernando, García, Luz, Maciá-Fernández, Gabriel, Molina, Rafael
Network anomaly detection is a very relevant research area nowadays, especially due to its multiple applications in the field of network security. The boost of new models based on variational autoencoders and generative adversarial networks has motivated a reevaluation of traditional techniques for anomaly detection. It is, however, essential to be able to understand these new models from the perspective of the experience attained from years of evaluating network security data for anomaly detection. In this paper, we revisit anomaly detection techniques based on PCA from a probabilistic generative model point of view, and contribute a mathematical model that relates them. Specifically, we start with the probabilistic PCA model and explain its connection to the Multivariate Statistical Network Monitoring (MSNM) framework. MSNM was recently successfully proposed as a means of incorporating industrial process anomaly detection experience into the field of networking. We have evaluated the mathematical model using two different datasets. The first, a synthetic dataset created to better understand the analysis proposed, and the second, UGR'16, is a specifically designed real-traffic dataset for network security anomaly detection. We have drawn conclusions that we consider to be useful when applying generative models to network security detection.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Asia > Middle East > Iran (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Networks (1.00)
Improving the Reliability of Network Intrusion Detection Systems through Dataset Integration
Magán-Carrión, Roberto, Urda, Daniel, Díaz-Cano, Ignacio, Dorronsoro, Bernabé
This work presents Reliable-NIDS (R-NIDS), a novel methodology for Machine Learning (ML) based Network Intrusion Detection Systems (NIDSs) that allows ML models to work on integrated datasets, empowering the learning process with diverse information from different datasets. Therefore, R-NIDS targets the design of more robust models, that generalize better than traditional approaches. We also propose a new dataset, called UNK21. It is built from three of the most well-known network datasets (UGR'16, USNW-NB15 and NLS-KDD), each one gathered from its own network environment, with different features and classes, by using a data aggregation approach present in R-NIDS. Following R-NIDS, in this work we propose to build two well-known ML models (a linear and a non-linear one) based on the information of three of the most common datasets in the literature for NIDS evaluation, those integrated in UNK21. The results that the proposed methodology offers show how these two ML models trained as a NIDS solution could benefit from this approach, being able to generalize better when training on the newly proposed UNK21 dataset. Furthermore, these results are carefully analyzed with statistical tools that provide high confidence on our conclusions.
- Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
- Europe > Spain > Castile and León > Burgos Province > Burgos (0.04)
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
- (3 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
UCF's 30-Year REU Site in Computer Vision
The U.S. Government's National Science Foundation (NSF) started the Research Experiences for Undergraduates (REU) program in the mid-1980s to attract undergraduates in STEM fields into research careers and to consider going to graduate school. The REU program offers grants to universities to plan and oversee research experiences that enrich undergraduate students' educational experiences. It is believed these experiences encourage the participants to pursue leadership careers in the fields of science, technology, engineering, or mathematics. The University of Central Florida's (UCF) Computer Vision group was in the selected first group of sites: only three REU sites in NSF's Division of Computer and Information Science and Engineering (CISE) were awarded funding in 1987. The grant duration was one year, so continued funding would require a new application for renewal the following year.