AITopics

2007.00816

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Urology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Souza, Vinicius M. A., Reis, Denis M. dos, Maletzke, Andre G., Batista, Gustavo E. A. P. A.

Challenges in Benchmarking Stream Learning Algorithms with Real-world Data

Streaming data are increasingly present in real-world applications such as sensor measurements, satellite data feed, stock market, and financial data. The main characteristics of these applications are the online arrival of data observations at high speed and the susceptibility to changes in the data distributions due to the dynamic nature of real environments. The data stream mining community still faces some primary challenges and difficulties related to the comparison and evaluation of new proposals, mainly due to the lack of publicly available non-stationary real-world datasets. The comparison of stream algorithms proposed in the literature is not an easy task, as authors do not always follow the same recommendations, experimental evaluation procedures, datasets, and assumptions. In this paper, we mitigate problems related to the choice of datasets in the experimental evaluation of stream classifiers and drift detectors. To that end, we propose a new public data repository for benchmarking stream algorithms with real-world data. This repository contains the most popular datasets from literature and new datasets related to a highly relevant public health problem that involves the recognition of disease vector insects using optical sensors. The main advantage of these new datasets is the prior knowledge of their characteristics and patterns of changes to evaluate new adaptive algorithm proposals adequately. We also present an in-depth discussion about the characteristics, reasons, and issues that lead to different types of changes in data distribution, as well as a critical review of common problems concerning the current benchmark datasets available in the literature.

dataset, immunology, us government, (24 more...)

doi: 10.1007/s10618-020-00698-5

2005.00113

Country:

South America > Brazil (0.28)
Europe (0.28)
Oceania > Australia > New South Wales (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry:

Materials > Chemicals (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Makhlouf, Karima, Zhioua, Sami, Palamidessi, Catuscia

On the Applicability of ML Fairness Notions

arXiv.org Artificial IntelligenceJun-30-2020

ML-based predictive systems are increasingly used to support decisions with a critical impact on individuals' lives such as college admission, job hiring, child custody, criminal risk assessment, etc. As a result, fairness emerged as an important requirement to guarantee that predictive systems do not discriminate against specific individuals or entire sub-populations, in particular, minorities. Given the inherent subjectivity of viewing the concept of fairness, several notions of fairness have been introduced in the literature. This paper is a survey of fairness notions that, unlike other surveys in the literature, addresses the question of "which notion of fairness is most suited to a given real-world scenario and why?". Our attempt to answer this question consists in (1) identifying the set of fairness-related characteristics of the real-world scenario at hand, (2) analyzing the behavior of each fairness notion, and then (3) fitting these two elements to recommend the most suitable fairness notion in every specific setup. The results are summarized in a decision diagram that can be used by practitioners and policy makers to navigate the relatively large catalogue of fairness notions.

artificial intelligence, fairness notion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2006.16745

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre:

Overview (0.88)
Research Report > New Finding (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.93)

He, Yang-Hui, Yau, Shing-Tung

Graph Laplacians, Riemannian Manifolds and their Machine-Learning

Graph Laplacians as well as related spectral inequalities and (co-)homology provide a foray into discrete analogues of Riemannian manifolds, providing a rich interplay between combinatorics, geometry and theoretical physics. We apply some of the latest techniques in data science such as supervised and unsupervised machine-learning and topological data analysis to the Wolfram database of some 8000 finite graphs in light of studying these correspondences. Encouragingly, we find that neural classifiers, regressors and networks can perform, with high efficiently and accuracy, a multitude of tasks ranging from recognizing graph Ricci-flatness, to predicting the spectral gap, to detecting the presence of Hamiltonian cycles, etc.

artificial intelligence, graph, machine learning, (18 more...)

2006.16619

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection

Li, Deqiang, Li, Qianmu

Malware remains a big threat to cyber security, calling for machine learning based malware detection. While promising, such detectors are known to be vulnerable to evasion attacks. Ensemble learning typically facilitates countermeasures, while attackers can leverage this technique to improve attack effectiveness as well. This motivates us to investigate which kind of robustness the ensemble defense or effectiveness the ensemble attack can achieve, particularly when they combat with each other. We thus propose a new attack approach, named mixture of attacks, by rendering attackers capable of multiple generative methods and multiple manipulation sets, to perturb a malware example without ruining its malicious functionality. This naturally leads to a new instantiation of adversarial training, which is further geared to enhancing the ensemble of deep neural networks. We evaluate defenses using Android malware detectors against 26 different attacks upon two practical datasets. Experimental results show that the new adversarial training significantly enhances the robustness of deep neural networks against a wide range of attacks, ensemble methods promote the robustness when base classifiers are robust enough, and yet ensemble attacks can evade the enhanced malware detectors effectively, even notably downgrading the VirusTotal service.

artificial intelligence, classifier, machine learning, (19 more...)

doi: 10.1109/TIFS.2020.3003571

2006.16545

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Deep Probabilistic Accelerated Evaluation: A Certifiable Rare-Event Simulation Methodology for Black-Box Autonomy

Arief, Mansur, Huang, Zhiyuan, Kumar, Guru Koushik Senthil, Bai, Yuanlu, He, Shengyi, Ding, Wenhao, Lam, Henry, Zhao, Ding

Evaluating the reliability of intelligent physical systems against rare catastrophic events poses a huge testing burden for real-world applications. Simulation provides a useful, if not unique, platform to evaluate the extremal risks of these AIenabled systems before their deployments. Importance Sampling (IS), while proven to be powerful for rare-event simulation, faces challenges in handling these systems due to their black-box nature that fundamentally undermines its efficiency guarantee. To overcome this challenge, we propose a framework called Deep Probabilistic Accelerated Evaluation (D-PrAE) to design IS, which leverages rare-event-set learning and and a new notion of efficiency certificate. D-PrAE combines the dominating point method with deep neural network classifiers to achieve superior estimation efficiency. We present theoretical guarantees and demonstrate the empirical effectiveness of D-PrAE via examples on the safety-testing of self-driving algorithms that are beyond the reach of classical variance reduction techniques.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

2006.15722

Country:

North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.81)

Industry: Transportation > Air (0.61)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
(2 more...)

Kendall transformation

Kursa, Miron Bartosz

Kendall transformation is a conversion of an ordered feature into a vector of pairwise order relations between individual values. This way, it preserves ranking of observations and represents it in a categorical form. Such transformation allows for generalisation of methods requiring strictly categorical input, especially in the limit of small number of observations, when discretisation becomes problematic. In particular, many approaches of information theory can be directly applied to Kendall-transformed continuous data without relying on differential entropy or any additional parameters. Moreover, by filtering information to this contained in ranking, Kendall transformation leads to a better robustness at a reasonable cost of dropping sophisticated interactions which are anyhow unlikely to be correctly estimated. In bivariate analysis, Kendall transformation can be related to popular non-parametric methods, showing the soundness of the approach. The paper also demonstrates its efficiency in multivariate problems, as well as provides an example analysis of a real-world data.

entropy, information, kendall transformation, (14 more...)

2006.15991

Country: Europe > Poland > Masovia Province > Warsaw (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Xiangmin, Han, Jun, Wang, Weijun, Zhou, Cai, Chang, Shihui, Ying, Jun, Shi

Elastography ultrasound (EUS) provides additional bio-mechanical information about lesion for B-mode ultrasound (BUS) in the diagnosis of breast cancers. However, joint utilization of both BUS and EUS is not popular due to the lack of EUS devices in rural hospitals, which arouses a novel modality imbalance problem in computer-aided diagnosis (CAD) for breast cancers. Current transfer learning (TL) pay little attention to this special issue of clinical modality imbalance, that is, the source domain (EUS modality) has fewer labeled samples than those in the target domain (BUS modality). Moreover, these TL methods cannot fully use the label information to explore the intrinsic relation between two modalities and then guide the promoted knowledge transfer. To this end, we propose a novel doubly supervised TL network (DDSTN) that integrates the Learning Using Privileged Information (LUPI) paradigm and the Maximum Mean Discrepancy (MMD) criterion into a unified deep TL framework. The proposed algorithm can not only make full use of the shared labels to effectively guide knowledge transfer by LUPI paradigm, but also perform additional supervised transfer between unpaired data. We further introduce the MMD criterion to enhance the knowledge transfer. The experimental results on the breast ultrasound dataset indicate that the proposed DDSTN outperforms all the compared state-of-the-art algorithms for the BUSbased CAD. Keywords: Ultrasound imaging, Breast cancer, Deep doubly supervised transfer learning, Support vector machine plus, Maximum mean discrepancy.

algorithm, artificial intelligence, machine learning, (17 more...)

2007.06634

Country:

Asia > China > Shanghai > Shanghai (0.06)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Random Partitioning Forest for Point-Wise and Collective Anomaly Detection -- Application to Intrusion Detection

Marteau, Pierre-Francois

In this paper, we propose DiFF-RF, an ensemble approach composed of random partitioning binary trees to detect point-wise and collective (as well as contextual) anomalies. Thanks to a distance-based paradigm used at the leaves of the trees, this semi-supervised approach solves a drawback that has been identified in the isolation forest (IF) algorithm. Moreover, taking into account the frequencies of visits in the leaves of the random trees allows to significantly improve the performance of DiFF-RF when considering the presence of collective anomalies. DiFF-RF is fairly easy to train, and excellent performance can be obtained by using a simple semi-supervised procedure to setup the extra hyper-parameter that is introduced. We first evaluate DiFF-RF on a synthetic data set to i) verify that the limitation of the IF algorithm is overcome, ii) demonstrate how collective anomalies are actually detected and iii) to analyze the effect of the meta-parameters it involves. We assess the DiFF-RF algorithm on a large set of datasets from the UCI repository, as well as two benchmarks related to intrusion detection applications. Our experiments show that DiFF-RF almost systematically outperforms the IF algorithm, but also challenges the one-class SVM baseline and a deep learning variational auto-encoder architecture. Furthermore, our experience shows that DiFF-RF can work well in the presence of small-scale learning data, which is conversely difficult for deep neural architectures. Finally, DiFF-RF is computationally efficient and can be easily parallelized on multi-core architectures.

anomaly, data mining, machine learning, (18 more...)

2006.16801

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Oceania > Australia > New South Wales (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Bennette, Walter, Maurer, Karsten, Sisti, Sean

Harnessing Adversarial Distances to Discover High-Confidence Errors

Given a deep neural network image classification model that we treat as a black box, and an unlabeled evaluation dataset, we develop an efficient strategy by which the classifier can be evaluated. Randomly sampling and labeling instances from an unlabeled evaluation dataset allows traditional performance measures like accuracy, precision, and recall to be estimated. However, random sampling may miss rare errors for which the model is highly confident in its prediction, but wrong. These high-confidence errors can represent costly mistakes, and therefore should be explicitly searched for. Past works have developed search techniques to find classification errors above a specified confidence threshold, but ignore the fact that errors should be expected at confidence levels anywhere below 100\%. In this work, we investigate the problem of finding errors at rates greater than expected given model confidence. Additionally, we propose a query-efficient and novel search technique that is guided by adversarial perturbations to find these mistakes in black box models. Through rigorous empirical experimentation, we demonstrate that our Adversarial Distance search discovers high-confidence errors at a rate greater than expected given model confidence.

artificial intelligence, machine learning, prediction, (19 more...)

2006.16055

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Ohio > Butler County > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)