AITopics

The application of Artificial Intelligence (AI) and Machine Learning (ML) to cybersecurity challenges has gained traction in industry and academia, partially as a result of widespread malware attacks on critical systems such as cloud infrastructures and government institutions. Intrusion Detection Systems (IDS), using some forms of AI, have received widespread adoption due to their ability to handle vast amounts of data with a high prediction accuracy. These systems are hosted in the organizational Cyber Security Operation Center (CSoC) as a defense tool to monitor and detect malicious network flow that would otherwise impact the Confidentiality, Integrity, and Availability (CIA). CSoC analysts rely on these systems to make decisions about the detected threats. However, IDSs designed using Deep Learning (DL) techniques are often treated as black box models and do not provide a justification for their predictions. This creates a barrier for CSoC analysts, as they are unable to improve their decisions based on the model's predictions. One solution to this problem is to design explainable IDS (X-IDS). This survey reviews the state-of-the-art in explainable AI (XAI) for IDS, its current challenges, and discusses how these challenges span to the design of an X-IDS. In particular, we discuss black box and white box approaches comprehensively. We also present the tradeoff between these approaches in terms of their performance and ability to produce explanations. Furthermore, we propose a generic architecture that considers human-in-the-loop which can be used as a guideline when designing an X-IDS. Research recommendations are given from three critical viewpoints: the need to define explainability for IDS, the need to create explanations tailored to various stakeholders, and the need to design metrics to evaluate explanations.

data mining, explanation, machine learning, (22 more...)

2207.06236

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Mississippi > Warren County > Vicksburg (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(9 more...)

Dablain, Damien, Krawczyk, Bartosz, Chawla, Nitesh

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

Machine learning (ML) is playing an increasingly important role in rendering decisions that affect a broad range of groups in society. ML models inform decisions in criminal justice, the extension of credit in banking, and the hiring practices of corporations. This posits the requirement of model fairness, which holds that automated decisions should be equitable with respect to protected features (e.g., gender, race, or age) that are often under-represented in the data. We postulate that this problem of under-representation has a corollary to the problem of imbalanced data learning. This class imbalance is often reflected in both classes and protected features. For example, one class (those receiving credit) may be over-represented with respect to another class (those not receiving credit) and a particular group (females) may be under-represented with respect to another group (males). A key element in achieving algorithmic fairness with respect to protected groups is the simultaneous reduction of class and protected group imbalance in the underlying training data, which facilitates increases in both model accuracy and fairness. We discuss the importance of bridging imbalanced learning and group fairness by showing how key concepts in these fields overlap and complement each other; and propose a novel oversampling algorithm, Fair Oversampling, that addresses both skewed class distributions and protected features. Our method: (i) can be used as an efficient pre-processing algorithm for standard ML algorithms to jointly address imbalance and group equity; and (ii) can be combined with fairness-aware learning algorithms to improve their robustness to varying levels of class imbalance. Additionally, we take a step toward bridging the gap between fairness and imbalanced learning with a new metric, Fair Utility, that combines balanced accuracy with fairness.

artificial intelligence, fairness, machine learning, (17 more...)

2207.06084

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
Asia > Singapore (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.67)
Banking & Finance > Credit (0.46)
Law > Civil Rights & Constitutional Law (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Makwana, Dhruv, Nag, Subhrajit, Susladkar, Onkar, Deshmukh, Gayatri, R, Sai Chandra Teja, Mittal, Sparsh, Mohan, C Krishna

ACLNet: An Attention and Clustering-based Cloud Segmentation Network

We propose a novel deep learning model named ACLNet, for cloud segmentation from ground images. ACLNet uses both deep neural network and machine learning (ML) algorithm to extract complementary features. Specifically, it uses EfficientNet-B0 as the backbone, "`a trous spatial pyramid pooling" (ASPP) to learn at multiple receptive fields, and "global attention module" (GAM) to extract finegrained details from the image. ACLNet also uses k-means clustering to extract cloud boundaries more precisely. ACLNet is effective for both daytime and nighttime images. It provides lower error rate, higher recall and higher F1-score than state-of-art cloud segmentation models. The source-code of ACLNet is available here: https://github.com/ckmvigil/ACLNet.

aclnet, feature map, segmentation, (16 more...)

doi: 10.1080/2150704X.2022.2097031

2207.06277

Country:

Asia > India > Uttarakhand > Roorkee (0.04)
Asia > India > Telangana > Hyderabad (0.04)
Asia > Japan (0.04)
Asia > India > Maharashtra > Pune (0.04)

Genre: Research Report (0.82)

Industry: Energy (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sun, Jiayin, Dong, Qiulei

Orthogonal-Coding-Based Feature Generation for Transductive Open-Set Recognition via Dual-Space Consistent Sampling

Open-set recognition (OSR) aims to simultaneously detect unknown-class samples and classify known-class samples. Most of the existing OSR methods are inductive methods, which generally suffer from the domain shift problem that the learned model from the known-class domain might be unsuitable for the unknown-class domain. Addressing this problem, inspired by the success of transductive learning for alleviating the domain shift problem in many other visual tasks, we propose an Iterative Transductive OSR framework, called IT-OSR, which implements three explored modules iteratively, including a reliability sampling module, a feature generation module, and a baseline update module. Specifically, at each iteration, a dual-space consistent sampling approach is presented in the explored reliability sampling module for selecting some relatively more reliable ones from the test samples according to their pseudo labels assigned by a baseline method, which could be an arbitrary inductive OSR method. Then, a conditional dual-adversarial generative network under an orthogonal coding condition is designed in the feature generation module to generate discriminative sample features of both known and unknown classes according to the selected test samples with their pseudo labels. Finally, the baseline method is updated for sample re-prediction in the baseline update module by jointly utilizing the generated features, the selected test samples with pseudo labels, and the training samples. Extensive experimental results on both the standard-dataset and the cross-dataset settings demonstrate that the derived transductive methods, by introducing two typical inductive OSR methods into the proposed IT-OSR framework, achieve better performances than 15 state-of-the-art methods in most cases.

module, recognition, test sample, (15 more...)

2207.05957

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool

Purwanto, Rizka, Pal, Arindam, Blair, Alan, Jha, Sanjay

In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.

doi: 10.1109/TIFS.2022.3164212.

2207.10801

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Pombal, José, Cruz, André F., Bravo, João, Saleiro, Pedro, Figueiredo, Mário A. T., Bizarro, Pedro

Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions

In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.

disparity, fairness, prevalence, (14 more...)

2207.06273

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Financial Services (0.66)
Law Enforcement & Public Safety > Fraud (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Bándi, Péter, Balkenhol, Maschenka, van Dijk, Marcory, van Ginneken, Bram, van der Laak, Jeroen, Litjens, Geert

Domain adaptation strategies for cancer-independent detection of lymph node metastases

Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper we investigate how to leverage existing high-quality datasets most efficiently in multi-task settings for closely related tasks. Specifically, we will explore different training and domain adaptation strategies, including prevention of catastrophic forgetting, for colon and head-and-neck cancer metastasis detection in lymph nodes. Our results show state-of-the-art performance on both cancer metastasis detection tasks. Furthermore, we show the effectiveness of repeated adaptation of networks from one cancer type to another to obtain multi-task metastasis detection networks. Last, we show that leveraging existing high-quality datasets can significantly boost performance on new target tasks and that catastrophic forgetting can be effectively mitigated using regularization.

dataset, metastasis, source task, (14 more...)

2207.06193

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Netherlands > Gelderland > Arnhem (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJul-12-2022, 20:49:51 GMT

Artificial intelligence in cardiovascular medicine

Artificial intelligence (AI) is a rapidly evolving transdisciplinary field employing machine learning (ML) techniques, which aim to simulate human intuition to offer cost-effective and scalable solutions to better manage CVD. ML algorithms are increasingly being developed and applied in various facets of cardiovascular medicine, including and not limited to heart failure, electrophysiology, valvular heart disease and coronary artery disease. Within heart failure, AI algorithms can augment diagnostic capabilities and clinical decision-making through automated cardiac measurements. Occult cardiac disease is increasingly being identified using ML from diagnostic data. Improved diagnostic and prognostic capabilities using ML algorithms are enhancing clinical care of patients with valvular heart disease and coronary artery disease. The growth of AI techniques is not without inherent challenges, most important of which is the need for greater external validation through multicenter, prospective clinical trials.

algorithm, ml algorithm, neural network, (16 more...)

#artificialintelligence

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
North America > United States > New York (0.04)
North America > United States > Minnesota > Olmsted County > Rochester (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

#artificialintelligenceJul-12-2022, 01:25:42 GMT

Confusion Matrices - Part 2

This post takes off where the last one left off and talks about building confusion matrices for multi-class classification problems. We load the Iris dataset, split it into training and test sets, build a K-Nearest Neighbors (k-NN) classifier that attempts to predict the class of Iris plant (setosa, versicolor, or virginica), and craft a confusion matrix using these predictions. We then describe some additional metrics, including the macro and micro precision, and discuss sklearn's classification_report, discussing the $F_1$ metric and delving slightly deeper into the $F_{0.5}$ In the end, we discuss the classification_report for the confusion matrix we built on the Iris dataset. Let's import the needed libraries and set the matplotlib and seaborn settings.

classifier, precision, time text, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceJul-12-2022, 01:25:06 GMT

DonorsChoose.org Application Screening

DonorsChoose.org is a United States-based nonprofit organization that allows individuals to donate directly to public school classroom projects. At any given time, there are thousands of classroom requests that can be brought to life with a gift of any amount. Right now, a large number of volunteers is needed to manually screen each submission before it's approved to be posted on the DonorsChoose.org The goal of the competition is to predict whether or not a DonorsChoose.org The competition dataset contains information from teachers' project applications to DonorsChoose.org

category, donorschoose, project proposal, (14 more...)

#artificialintelligence

Country:

North America > United States > New York > Bronx County > New York City (0.05)
North America > United States > California (0.05)

Industry: Education > Educational Setting (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)