AITopics

2211.13445

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Colombo, Pierre, Gomes, Eduardo D. C., Staerman, Guillaume, Noiry, Nathan, Piantanida, Pablo

Beyond Mahalanobis-Based Scores for Textual OOD Detection

arXiv.org Artificial IntelligenceNov-24-2022

Deep learning methods have boosted the adoption of NLP systems in real-life applications. However, they turn out to be vulnerable to distribution shifts over time which may cause severe dysfunctions in production systems, urging practitioners to develop tools to detect out-of-distribution (OOD) samples through the lens of the neural network. In this paper, we introduce TRUSTED, a new OOD detector for classifiers based on Transformer architectures that meets operational requirements: it is unsupervised and fast to compute. The efficiency of TRUSTED relies on the fruitful idea that all hidden layers carry relevant information to detect OOD examples. Based on this, for a given input, TRUSTED consists in (i) aggregating this information and (ii) computing a similarity score by exploiting the training distribution, leveraging the powerful concept of data depth. Our extensive numerical experiments involve 51k model configurations, including various checkpoints, seeds, and datasets, and demonstrate that TRUSTED achieves state-of-the-art performances. In particular, it improves previous AUROC over 3 points.

artificial intelligence, machine learning, msp 0, (15 more...)

2211.13527

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Berlin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report > New Finding (0.45)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Schubert, David, Gupta, Pritha, Wever, Marcel

Meta-Learning for Automated Selection of Anomaly Detectors for Semi-Supervised Datasets

arXiv.org Artificial IntelligenceNov-24-2022

In anomaly detection, a prominent task is to induce a model to identify anomalies learned solely based on normal data. Generally, one is interested in finding an anomaly detector that correctly identifies anomalies, i.e., data points that do not belong to the normal class, without raising too many false alarms. Which anomaly detector is best suited depends on the dataset at hand and thus needs to be tailored. The quality of an anomaly detector may be assessed via confusion-based metrics such as the Matthews correlation coefficient (MCC). However, since during training only normal data is available in a semi-supervised setting, such metrics are not accessible. To facilitate automated machine learning for anomaly detectors, we propose to employ meta-learning to predict MCC scores based on metrics that can be computed with normal data only. First promising results can be obtained considering the hypervolume and the false positive rate as meta-features.

artificial intelligence, data mining, machine learning, (14 more...)

2211.13681

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > France (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Machine LearningNov-23-2022

Quantitative deterministic equivalent of sample covariance matrices with a general dependence structure

Chouard, Clément

We study sample covariance matrices arising from rectangular random matrices with i.i.d. columns. It was previously known that the resolvent of these matrices admits a deterministic equivalent when the spectral parameter stays bounded away from the real axis. We extend this work by proving quantitative bounds involving both the dimensions and the spectral parameter, in particular allowing it to get closer to the real positive semi-line. As applications, we obtain a new bound for the convergence in Kolmogorov distance of the empirical spectral distributions of these general models. We also apply our framework to the problem of regularization of Random Features models in Machine Learning without Gaussian hypothesis.

artificial intelligence, deterministic equiv alent, machine learning, (15 more...)

arXiv.org Machine Learning

2211.13044

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > Indiana (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Bertrand, Simon, Tawbi, Nadia, Desharnais, Josée

Unsupervised User-Based Insider Threat Detection Using Bayesian Gaussian Mixture Models

Insider threats are a growing concern for organizations due to the amount of damage that their members can inflict by combining their privileged access and domain knowledge. Nonetheless, the detection of such threats is challenging, precisely because of the ability of the authorized personnel to easily conduct malicious actions and because of the immense size and diversity of audit data produced by organizations in which the few malicious footprints are hidden. In this paper, we propose an unsupervised insider threat detection system based on audit data using Bayesian Gaussian Mixture Models. The proposed approach leverages a user-based model to optimize specific behaviors modelization and an automatic feature extraction system based on Word2Vec for ease of use in a real-life scenario. The solution distinguishes itself by not requiring data balancing nor to be trained only on normal instances, and by its little domain knowledge required to implement. Still, results indicate that the proposed method competes with state-of-the-art approaches, presenting a good recall of 88\%, accuracy and true negative rate of 93%, and a false positive rate of 6.9%. For our experiments, we used the benchmark dataset CERT version 4.2.

artificial intelligence, detection, machine learning, (14 more...)

2211.14437

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > Quebec (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)
Overview > Growing Problem (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Meister, Julia A., Nguyen, Khuong An, Luo, Zhiyuan

Audio feature ranking for sound-based COVID-19 patient detection

Audio classification using breath and cough samples has recently emerged as a low-cost, non-invasive, and accessible COVID-19 screening method. However, a comprehensive survey shows that no application has been approved for official use at the time of writing, due to the stringent reliability and accuracy requirements of the critical healthcare setting. To support the development of Machine Learning classification models, we performed an extensive comparative investigation and ranking of 15 audio features, including less well-known ones. The results were verified on two independent COVID-19 sound datasets. By using the identified top-performing features, we have increased COVID-19 classification accuracy by up to 17% on the Cambridge dataset and up to 10% on the Coswara dataset compared to the original baseline accuracies without our feature ranking.

artificial intelligence, dataset, machine learning, (13 more...)

doi: 10.1007/978-3-031-16474-3_13

2104.07128

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Lohrer, Andreas, Kazempour, Daniyal, Hünemörder, Maximilian, Kröger, Peer

CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier data sets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given data set. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our outlier detection method using coMAD-PCA defines dependent on its variant an inlier region with a robust noise margin by measures of in-distribution (ID) and out-of-distribution (OOD). These measures allow distribution based outlier scoring for each principal component, and thus, for an appropriate alignment of the decision boundary between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), recall and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

artificial intelligence, data mining, machine learning, (13 more...)

2211.13314

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Duran, Audrey, Dussert, Gaspard, Rouvière, Olivier, Jaouen, Tristan, Jodoin, Pierre-Marc, Lartizien, Carole

ProstAttention-Net: A deep attention model for prostate cancer segmentation by aggressiveness in MRI scans

Multiparametric magnetic resonance imaging (mp-MRI) has shown excellent results in the detection of prostate cancer (PCa). However, characterizing prostate lesions aggressiveness in mp-MRI sequences is impossible in clinical practice, and biopsy remains the reference to determine the Gleason score (GS). In this work, we propose a novel end-to-end multi-class network that jointly segments the prostate gland and cancer lesions with GS group grading. After encoding the information on a latent space, the network is separated in two branches: 1) the first branch performs prostate segmentation 2) the second branch uses this zonal prior as an attention gate for the detection and grading of prostate lesions. The model was trained and validated with a 5-fold cross-validation on an heterogeneous series of 219 MRI exams acquired on three different scanners prior prostatectomy. In the free-response receiver operating characteristics (FROC) analysis for clinically significant lesions (defined as GS > 6) detection, our model achieves 69.0% $\pm$14.5% sensitivity at 2.9 false positive per patient on the whole prostate and 70.8% $\pm$14.4% sensitivity at 1.5 false positive when considering the peripheral zone (PZ) only. Regarding the automatic GS group

artificial intelligence, lesion, machine learning, (18 more...)

doi: 10.1016/j.media.2021.102347

2211.13238

Country:

North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Urology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Dziedzic, Adam, Choquette-Choo, Christopher A, Dullerud, Natalie, Suriyakumar, Vinith Menon, Shamsabadi, Ali Shahin, Kaleem, Muhammad Ahmad, Jha, Somesh, Papernot, Nicolas, Wang, Xiao

Private Multi-Winner Voting for Machine Learning

Private multi-winner voting is the task of revealing $k$-hot binary vectors satisfying a bounded differential privacy (DP) guarantee. This task has been understudied in machine learning literature despite its prevalence in many domains such as healthcare. We propose three new DP multi-winner mechanisms: Binary, $\tau$, and Powerset voting. Binary voting operates independently per label through composition. $\tau$ voting bounds votes optimally in their $\ell_2$ norm for tight data-independent guarantees. Powerset voting operates over the entire binary vector by viewing the possible outcomes as a power set. Our theoretical and empirical analysis shows that Binary voting can be a competitive mechanism on many tasks unless there are strong correlations between labels, in which case Powerset voting outperforms it. We use our mechanisms to enable privacy-preserving multi-label learning in the central setting by extending the canonical single-label technique: PATE. We find that our techniques outperform current state-of-the-art approaches on large, real-world healthcare data and standard multi-label benchmarks. We further enable multi-label confidential and private collaborative (CaPC) learning and show that model performance can be significantly improved in the multi-site setting.

data mining, machine learning, mechanism, (17 more...)

2211.1541

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(9 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.95)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Sarhan, Mohanad, Layeghy, Siamak, Portmann, Marius

Feature Analysis for Machine Learning-based IoT Intrusion Detection

Internet of Things (IoT) networks have become an increasingly attractive target of cyberattacks. Powerful Machine Learning (ML) models have recently been adopted to implement network intrusion detection systems to protect IoT networks. For the successful training of such ML models, selecting the right data features is crucial, maximising the detection accuracy and computational efficiency. This paper comprehensively analyses feature sets' importance and predictive power for detecting network attacks. Three feature selection algorithms: chi-square, information gain and correlation, have been utilised to identify and rank data features. The attributes are fed into two ML classifiers: deep feed-forward and random forest, to measure their attack detection performance. The experimental evaluation considered three datasets: UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In addition, the respective variants in NetFlow format were also considered, i.e., NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation explored the marginal benefit of adding individual features. Our results show that the accuracy initially increases rapidly with adding features but converges quickly to the maximum. This demonstrates a significant potential to reduce the computational and storage cost of intrusion detection systems while maintaining near-optimal detection accuracy. This has particular relevance in IoT systems, with typically limited computational and storage resources.

artificial intelligence, dataset, machine learning, (14 more...)

2108.12732

Country:

Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)