Goto

Collaborating Authors

 Performance Analysis


A Novel Community Detection Based Genetic Algorithm for Feature Selection

arXiv.org Machine Learning

The selection of features is an essential data preprocessing stage in data mining. The core principle of feature selection seems to be to pick a subset of possible features by excluding features with almost no predictive information as well as highly associated redundant features. In the past several years, a variety of meta-heuristic methods were introduced to eliminate redundant and irrelevant features as much as possible from high-dimensional datasets. Among the main disadvantages of present meta-heuristic based approaches is that they are often neglecting the correlation between a set of selected features. In this article, for the purpose of feature selection, the authors propose a genetic algorithm based on community detection, which functions in three steps. The feature similarities are calculated in the first step. The features are classified by community detection algorithms into clusters throughout the second step. In the third step, features are picked by a genetic algorithm with a new community-based repair operation. Nine benchmark classification problems were analyzed in terms of the performance of the presented approach. Also, the authors have compared the efficiency of the proposed approach with the findings from four available algorithms for feature selection. The findings indicate that the new approach continuously yields improved classification accuracy.


Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

arXiv.org Machine Learning

In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. We have designed this challenge as the first benchmark of ASD research, which includes a large-scale dataset, evaluation metrics, and a simple baseline system. We received 117 submissions from 40 teams, and several novel approaches have been developed as a result of this challenge. On the basis of the analysis of the evaluation results, we discuss two new approaches and their problems.


ROC Curve and AUC -- Explained

#artificialintelligence

ROC (receiver operating characteristics) curve and AOC (area under the curve) are performance measures that provide a comprehensive evaluation of classification models. AUC turns the ROC curve into a numeric representation of performance for a binary classifier. AUC is the area under the ROC curve and takes a value between 0 and 1. AUC indicates how successful a model is at separating positive and negative classes. Before going in detail, let's first explain the confusion matrix and how different threshold values change the outcome of it. A confusion matrix is not a metric to evaluate a model, but it provides insight into the predictions.


Repeated k-Fold Cross-Validation for Model Evaluation in Python

#artificialintelligence

The k-fold cross-validation procedure is a standard method for estimating the performance of a machine learning algorithm or configuration on a dataset. A single run of the k-fold cross-validation procedure may result in a noisy estimate of model performance. Different splits of the data may result in very different results. Repeated k-fold cross-validation provides a way to improve the estimated performance of a machine learning model. This involves simply repeating the cross-validation procedure multiple times and reporting the mean result across all folds from all runs.


A Technique for Determining Relevance Scores of Process Activities using Graph-based Neural Networks

arXiv.org Artificial Intelligence

A central role in process improvement is played by the process analyst [2], who is responsible for'monitoring, measuring, and providing feedback on the performance of a business process' [3, p.45]. The ongoing implementation of information systems in organisations, along with the subsequently enhanced availability of event log data, have enabled process analysts to discover as-is models of processes with process mining with relative ease [4]. However, the crucial challenge lies in identifying potential areas for process improvements (i.e., process analysis) with respect to a strategic goal [5]; this requires analytical capabilities such as Pareto or root cause analysis [2]. A business process can be defined as a'completely closed, timely, and logical sequence of activities' [6, p.3] that realises an outcome valuable to a customer [7]. The effectiveness (i.e., customer value) and efficiency (e.g., timely, logical sequence, resource utilisation) of a business process are monitored using key performance indicators (KPIs) as aggregated measures of process outcomes; in the context of BPM, these are often referred to as process performance indicators (PPIs) [8]. Thus, to improve a business process, it is essential for a process analyst to understand the relevance of individual process activities in terms of their impact on the dimensions expressed by these performance measures.


ROC Curve in Machine Learning

#artificialintelligence

The Receiver Operating Characteristic (ROC) curve is a popular tool used with binary classifiers. It is very similar to the precision/recall curve. Still, instead of plotting precision versus recall, the ROC curve plots the true positive rate (another name for recall) against the false positive rate (FPR). The FPR is the ratio of negative instances that are incorrectly classified as positive. It is equal to 1 – the true negative rate (TNR), which is the ratio of negative cases that are correctly classified as negative.


Precision and Recall in Machine Learning

#artificialintelligence

In Machine Learning, Precision and Recall are the two most important metrics for Model Evaluation. Precision represents the percentage of the results of your model, which are relevant to your model. The recall represents the percentage total of total pertinent results classified correctly by your machine learning algorithm. In this article, I will show you how you can apply Precision and Recall to evaluate the performance of your Machine Learning model. See Full Article -- thecleverprogrammer.com.


A critical analysis of metrics used for measuring progress in artificial intelligence

arXiv.org Artificial Intelligence

Comparing model performances on benchmark datasets is an integral part of measuring and driving progress in artificial intelligence. A model's performance on a benchmark dataset is commonly assessed based on a single or a small set of performance metrics. While this enables quick comparisons, it may also entail the risk of inadequately reflecting model performance if the metric does not sufficiently cover all performance characteristics. Currently, it is unknown to what extent this might impact current benchmarking efforts. To address this question, we analysed the current landscape of performance metrics based on data covering 3867 machine learning model performance results from the web-based open platform 'Papers with Code'. Our results suggest that the large majority of metrics currently used to evaluate classification AI benchmark tasks have properties that may result in an inadequate reflection of a classifiers' performance, especially when used with imbalanced datasets. While alternative metrics that address problematic properties have been proposed, they are currently rarely applied as performance metrics in benchmarking tasks. Finally, we noticed that the reporting of metrics was partly inconsistent and partly unspecific, which may lead to ambiguities when comparing model performances.


Machine Learning Fairness in Justice Systems: Base Rates, False Positives, and False Negatives

arXiv.org Artificial Intelligence

Machine learning best practice statements have proliferated, but there is a lack of consensus on what the standards should be. For fairness standards in particular, there is little guidance on how fairness might be achieved in practice. Specifically, fairness in errors (both false negatives and false positives) can pose a problem of how to set weights, how to make unavoidable tradeoffs, and how to judge models that present different kinds of errors across racial groups. This paper considers the consequences of having higher rates of false positives for one racial group and higher rates of false negatives for another racial group. The paper examines how different errors in justice settings can present problems for machine learning applications, the limits of computation for resolving tradeoffs, and how solutions might have to be crafted through courageous conversations with leadership, line workers, stakeholders, and impacted communities.


Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection

arXiv.org Machine Learning

Network attacks have been very prevalent as their rate is growing tremendously. Both organization and individuals are now concerned about their confidentiality, integrity and availability of their critical information which are often impacted by network attacks. To that end, several previous machine learning-based intrusion detection methods have been developed to secure network infrastructure from such attacks. In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique to tune the parameters of Support Vector Machine with Gaussian Kernel (SVM-RBF), Random Forest (RF), and k-Nearest Neighbor (k-NN) algorithms. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.