Goto

Collaborating Authors

 multiclass case




Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights

Cortez, Rommel, Krishnamoorthy, Bala

arXiv.org Machine Learning

Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new weighted Pearson-Matthews Correlation Coefficient (MCC) for binary classification as well as weighted versions of related multiclass measures. The weighted MCC varies between $-1$ and $1$. But crucially, the weighted MCC values are higher for classifiers that perform better on highly weighted observations, and hence is able to distinguish them from classifiers that have a similar overall performance and ones that perform better on the lowly weighted observations. Furthermore, we prove that the weighted measures are robust with respect to the choice of weights in a precise manner: if the weights are changed by at most $ε$, the value of the weighted measure changes at most by a factor of $ε$ in the binary case and by a factor of $ε^2$ in the multiclass case. Our computations demonstrate that the weighted measures clearly identify classifiers that perform better on higher weighted observations, while the unweighted measures remain completely indifferent to the choices of weights.



Response to Reviewer# 1 2 (Q1) A scope of negative result is unclear

Neural Information Processing Systems

Thank you for helpful comments and suggestions. We will address the concerns raised by the reviewers. We discuss the difficulty to satisfy Corollary 5 when the problem becomes multiclass in Lines 144-158. This makes it easier to tune the hyperparameters to satisfy the necessary condition as illustrated in Eq. (9) for We checked Eq. (5) in "Learning Confidence for Out-of-Distribution Detection in Neural Networks" Regarding the problem addressed by "On calibration of modern neural


Detecting AI-generated Artwork

Li, Meien, Stamp, Mark

arXiv.org Artificial Intelligence

The high efficiency and quality of artwork generated by Artificial Intelligence (AI) has created new concerns and challenges for human artists. In particular, recent improvements in generative AI have made it difficult for people to distinguish between human-generated and AI-generated art. In this research, we consider the potential utility of various types of Machine Learning (ML) and Deep Learning (DL) models in distinguishing AI-generated artwork from human-generated artwork. We focus on three challenging artistic styles, namely, baroque, cubism, and expressionism. The learning models we test are Logistic Regression (LR), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN). Our best experimental results yield a multiclass accuracy of 0.8208 over six classes, and an impressive accuracy of 0.9758 for the binary classification problem of distinguishing AI-generated from human-generated art.


Composite Multiclass Losses

Neural Information Processing Systems

We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a "proper composite loss", which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity condition, Bregman representation, order-sensitivity, existence and uniqueness of the composite representation for multiclass losses. We subsume existing results on "classification calibration" by relating it to properness and show that the simple integral representation for binary proper losses can not be extended to multiclass losses.


ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Mehari, Temesgen, Sundar, Ashish, Bosnjakovic, Alen, Harris, Peter, Williams, Steven E., Loewe, Axel, Doessel, Olaf, Nagel, Claudia, Strodthoff, Nils, Aston, Philip J.

arXiv.org Artificial Intelligence

On the other hand, it is quite conceivable that a simple diagnoses are made on the basis of a multitude of ECG binary classification of healthy vs. a specific pathology could features which consist mainly of time intervals between certain be successfully achieved by using only a reduced subset of the fiducial points on the ECG, amplitudes of prominent features complete list of diagnostic conditions. However, we consider or morphology of ECG segments. For each pathology, the it appropriate to study the simplest case first. A study of relevant criteria for specific features are well documented [1], multiclass feature importance algorithms with all four of the [2], although there may be minor differences between one above classes has been undertaken as a separate study [4].


A Comparative Evaluation of Quantification Methods

Schumacher, Tobias, Strohmaier, Markus, Lemmerich, Florian

arXiv.org Artificial Intelligence

Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research.


On Possibility and Impossibility of Multiclass Classification with Rejection

Ni, Chenri, Charoenphakdee, Nontawat, Honda, Junya, Sugiyama, Masashi

arXiv.org Machine Learning

We investigate the problem of multiclass classification with rejection, where a classifier can choose not to make a prediction to avoid critical misclassification. We consider two approaches for this problem: a traditional one based on confidence scores and a more recent one based on simultaneous training of a classifier and a rejector. An existing method in the former approach focuses on a specific class of losses and its empirical performance is not very convincing. In this paper, we propose confidence-based rejection criteria for multiclass classification, which can handle more general losses and guarantee calibration to the Bayes-optimal solution. The latter approach is relatively new and has been available only for the binary case, to the best of our knowledge. Our second contribution is to prove that calibration to the Bayes-optimal solution is almost impossible by this approach in the multiclass case. Finally, we conduct experiments to validate the relevance of our theoretical findings.