AITopics

Bertrand, Simon, Tawbi, Nadia, Desharnais, Josée

Unsupervised User-Based Insider Threat Detection Using Bayesian Gaussian Mixture Models

Insider threats are a growing concern for organizations due to the amount of damage that their members can inflict by combining their privileged access and domain knowledge. Nonetheless, the detection of such threats is challenging, precisely because of the ability of the authorized personnel to easily conduct malicious actions and because of the immense size and diversity of audit data produced by organizations in which the few malicious footprints are hidden. In this paper, we propose an unsupervised insider threat detection system based on audit data using Bayesian Gaussian Mixture Models. The proposed approach leverages a user-based model to optimize specific behaviors modelization and an automatic feature extraction system based on Word2Vec for ease of use in a real-life scenario. The solution distinguishes itself by not requiring data balancing nor to be trained only on normal instances, and by its little domain knowledge required to implement. Still, results indicate that the proposed method competes with state-of-the-art approaches, presenting a good recall of 88\%, accuracy and true negative rate of 93%, and a false positive rate of 6.9%. For our experiments, we used the benchmark dataset CERT version 4.2.

artificial intelligence, detection, machine learning, (14 more...)

2211.14437

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > Quebec (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)
Overview > Growing Problem (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Meister, Julia A., Nguyen, Khuong An, Luo, Zhiyuan

Audio feature ranking for sound-based COVID-19 patient detection

Audio classification using breath and cough samples has recently emerged as a low-cost, non-invasive, and accessible COVID-19 screening method. However, a comprehensive survey shows that no application has been approved for official use at the time of writing, due to the stringent reliability and accuracy requirements of the critical healthcare setting. To support the development of Machine Learning classification models, we performed an extensive comparative investigation and ranking of 15 audio features, including less well-known ones. The results were verified on two independent COVID-19 sound datasets. By using the identified top-performing features, we have increased COVID-19 classification accuracy by up to 17% on the Cambridge dataset and up to 10% on the Coswara dataset compared to the original baseline accuracies without our feature ranking.

artificial intelligence, dataset, machine learning, (13 more...)

doi: 10.1007/978-3-031-16474-3_13

2104.07128

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Lohrer, Andreas, Kazempour, Daniyal, Hünemörder, Maximilian, Kröger, Peer

CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier data sets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given data set. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our outlier detection method using coMAD-PCA defines dependent on its variant an inlier region with a robust noise margin by measures of in-distribution (ID) and out-of-distribution (OOD). These measures allow distribution based outlier scoring for each principal component, and thus, for an appropriate alignment of the decision boundary between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), recall and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

artificial intelligence, data mining, machine learning, (13 more...)

2211.13314

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Duran, Audrey, Dussert, Gaspard, Rouvière, Olivier, Jaouen, Tristan, Jodoin, Pierre-Marc, Lartizien, Carole

ProstAttention-Net: A deep attention model for prostate cancer segmentation by aggressiveness in MRI scans

Multiparametric magnetic resonance imaging (mp-MRI) has shown excellent results in the detection of prostate cancer (PCa). However, characterizing prostate lesions aggressiveness in mp-MRI sequences is impossible in clinical practice, and biopsy remains the reference to determine the Gleason score (GS). In this work, we propose a novel end-to-end multi-class network that jointly segments the prostate gland and cancer lesions with GS group grading. After encoding the information on a latent space, the network is separated in two branches: 1) the first branch performs prostate segmentation 2) the second branch uses this zonal prior as an attention gate for the detection and grading of prostate lesions. The model was trained and validated with a 5-fold cross-validation on an heterogeneous series of 219 MRI exams acquired on three different scanners prior prostatectomy. In the free-response receiver operating characteristics (FROC) analysis for clinically significant lesions (defined as GS > 6) detection, our model achieves 69.0% $\pm$14.5% sensitivity at 2.9 false positive per patient on the whole prostate and 70.8% $\pm$14.4% sensitivity at 1.5 false positive when considering the peripheral zone (PZ) only. Regarding the automatic GS group

artificial intelligence, lesion, machine learning, (18 more...)

doi: 10.1016/j.media.2021.102347

2211.13238

Country:

North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Urology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Dziedzic, Adam, Choquette-Choo, Christopher A, Dullerud, Natalie, Suriyakumar, Vinith Menon, Shamsabadi, Ali Shahin, Kaleem, Muhammad Ahmad, Jha, Somesh, Papernot, Nicolas, Wang, Xiao

Private Multi-Winner Voting for Machine Learning

Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $\tau$, and Powerset voting. Binary voting operates independently per label through composition. $\tau$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.

data mining, machine learning, mechanism, (17 more...)

2211.1541

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(9 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.95)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Sarhan, Mohanad, Layeghy, Siamak, Portmann, Marius

Feature Analysis for Machine Learning-based IoT Intrusion Detection

Internet of Things (IoT) networks have become an increasingly attractive target of cyberattacks. Powerful Machine Learning (ML) models have recently been adopted to implement network intrusion detection systems to protect IoT networks. For the successful training of such ML models, selecting the right data features is crucial, maximising the detection accuracy and computational efficiency. This paper comprehensively analyses feature sets' importance and predictive power for detecting network attacks. Three feature selection algorithms: chi-square, information gain and correlation, have been utilised to identify and rank data features. The attributes are fed into two ML classifiers: deep feed-forward and random forest, to measure their attack detection performance. The experimental evaluation considered three datasets: UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In addition, the respective variants in NetFlow format were also considered, i.e., NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation explored the marginal benefit of adding individual features. Our results show that the accuracy initially increases rapidly with adding features but converges quickly to the maximum. This demonstrates a significant potential to reduce the computational and storage cost of intrusion detection systems while maintaining near-optimal detection accuracy. This has particular relevance in IoT systems, with typically limited computational and storage resources.

artificial intelligence, dataset, machine learning, (14 more...)

2108.12732

Country:

Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Poenaru-Olaru, Lorena, Cruz, Luis, van Deursen, Arie, Rellermeyer, Jan S.

Are Concept Drift Detectors Reliable Alarming Systems? -- A Comparative Study

As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly evaluated on new streaming data. Given the continuous data flow, shifting data, also known as concept drift, is ubiquitous in such settings. Concept drift usually impacts the performance of machine learning models, thus, identifying the moment when concept drift occurs is required. Concept drift is identified through concept drift detectors. In this work, we assess the reliability of concept drift detectors to identify drift in time by exploring how late are they reporting drifts and how many false alarms are they signaling. We compare the performance of the most popular drift detectors belonging to two different concept drift detector groups, error rate-based detectors and data distribution-based detectors. We assess their performance on both synthetic and real-world data. In the case of synthetic data, we investigate the performance of detectors to identify two types of concept drift, abrupt and gradual. Our findings aim to help practitioners understand which drift detector should be employed in different situations and, to achieve this, we share a list of the most important observations made throughout this study, which can serve as guidelines for practical usage. Furthermore, based on our empirical results, we analyze the suitability of each concept drift detection group to be used as alarming system.

artificial intelligence, detector, machine learning, (18 more...)

2211.13098

Country:

Europe > Netherlands > South Holland > Delft (0.05)
Oceania > Australia > New South Wales (0.04)
Oceania > Australia > South Australia (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Chaudhuri, Ayushi, Nandi, Arijit, Pradhan, Buddhadeb

A Dynamic Weighted Federated Learning for Android Malware Classification

Android malware attacks are increasing daily at a tremendous volume, making Android users more vulnerable to cyber-attacks. Researchers have developed many machine learning (ML)/ deep learning (DL) techniques to detect and mitigate android malware attacks. However, due to technological advancement, there is a rise in android mobile devices. Furthermore, the devices are geographically dispersed, resulting in distributed data. In such scenario, traditional ML/DL techniques are infeasible since all of these approaches require the data to be kept in a central system; this may provide a problem for user privacy because of the massive proliferation of Android mobile devices; putting the data in a central system creates an overhead. Also, the traditional ML/DL-based android malware classification techniques are not scalable. Researchers have proposed federated learning (FL) based android malware classification system to solve the privacy preservation and scalability with high classification performance. In traditional FL, Federated Averaging (FedAvg) is utilized to construct the global model at each round by merging all of the local models obtained from all of the customers that participated in the FL. However, the conventional FedAvg has a disadvantage: if one poor-performing local model is included in global model development for each round, it may result in an under-performing global model. Because FedAvg favors all local models equally when averaging. To address this issue, our main objective in this work is to design a dynamic weighted federated averaging (DW-FedAvg) strategy in which the weights for each local model are automatically updated based on their performance at the client. The DW-FedAvg is evaluated using four popular benchmark datasets, Melgenome, Drebin, Kronodroid and Tuandromd used in android malware classification research.

artificial intelligence, local model, machine learning, (12 more...)

doi: 10.1007/978-981-19-9858-4_13

2211.12874

Country: Asia > India (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.35)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.98)

Redwan, Sadi Md., Uddin, Md Palash, Ulhaq, Anwaar, Sharif, Muhammad Imran

Power Spectral Density-Based Resting-State EEG Classification of First-Episode Psychosis

arXiv.org Artificial IntelligenceNov-22-2022

Historically, the analysis of stimulus-dependent time-frequency patterns has been the cornerstone of most electroencephalography (EEG) studies. The abnormal oscillations in high-frequency waves associated with psychotic disorders during sensory and cognitive tasks have been studied many times. However, any significant dissimilarity in the resting-state low-frequency bands is yet to be established. Spectral analysis of the alpha and delta band waves shows the effectiveness of stimulus-independent EEG in identifying the abnormal activity patterns of pathological brains. A generalized model incorporating multiple frequency bands should be more efficient in associating potential EEG biomarkers with First-Episode Psychosis (FEP), leading to an accurate diagnosis. We explore multiple machine-learning methods, including random-forest, support vector machine, and Gaussian Process Classifier (GPC), to demonstrate the practicality of resting-state Power Spectral Density (PSD) to distinguish patients of FEP from healthy controls. A comprehensive discussion of our preprocessing methods for PSD analysis and a detailed comparison of different models are included in this paper. The GPC model outperforms the other models with a specificity of 95.78% to show that PSD can be used as an effective feature extraction technique for analyzing and classifying resting-state EEG signals of psychiatric disorders.

artificial intelligence, classification, machine learning, (14 more...)

2301.01588

Country:

Oceania > Australia (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Bangladesh (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)