Goto

Collaborating Authors

 cost curve


Evaluating classification performance across operating contexts: A comparison of decision curve analysis and cost curves

arXiv.org Machine Learning

Classification models typically predict a score and use a decision threshold to produce a classification. Appropriate model evaluation should carefully consider the context in which a model will be used, including the relative value of correct classifications of positive versus negative examples, which affects the threshold that should be used. Decision curve analysis (DCA) and cost curves are model evaluation approaches that assess the expected utility and expected loss of prediction models, respectively, across decision thresholds. We compared DCA and cost curves to determine how they are related, and their strengths and limitations. We demonstrate that decision curves are closely related to a specific type of cost curve called a Brier curve. Both curves are derived assuming model scores are calibrated and setting the classification threshold using the relative value of correct positive and negative classifications, and the x-axis of both curves are equivalent. Net benefit (used for DCA) and Brier loss (used for Brier curves) will always choose the same model as optimal at any given threshold. Across thresholds, differences in Brier loss are comparable whereas differences in net benefit cannot be compared. Brier curves are more generally applicable (when a wider range of thresholds are plausible), and the area under the Brier curve is the Brier score. We demonstrate that reference lines common in each space can be included in either and suggest the upper envelope decision curve as a useful comparison for DCA showing the possible gain in net benefit that could be achieved through recalibration alone.


Quantum-Enhanced Classification of Brain Tumors Using DNA Microarray Gene Expression Profiles

arXiv.org Artificial Intelligence

DNA microarray technology enables the simultaneous measurement of expression levels of thousands of genes, thereby facilitating the understanding of the molecular mechanisms underlying complex diseases such as brain tumors and the identification of diagnostic genetic signatures. To derive meaningful biological insights from the high-dimensional and complex gene features obtained through this technology and to analyze gene properties in detail, classical AI-based approaches such as machine learning and deep learning are widely employed. However, these methods face various limitations in managing high-dimensional vector spaces and modeling the intricate relationships among genes. In particular, challenges such as hyperparameter tuning, computational costs, and high processing power requirements can hinder their efficiency. To overcome these limitations, quantum computing and quantum AI approaches are gaining increasing attention. Leveraging quantum properties such as superposition and entanglement, quantum methods enable more efficient parallel processing of high-dimensional data and offer faster and more effective solutions to problems that are computationally demanding for classical methods. In this study, a novel model called "Deep VQC" is proposed, based on the Variational Quantum Classifier approach. Developed using microarray data containing 54,676 gene features, the model successfully classified four different types of brain tumors-ependymoma, glioblastoma, medulloblastoma, and pilocytic astrocytoma-alongside healthy samples with high accuracy. Furthermore, compared to classical ML algorithms, our model demonstrated either superior or comparable classification performance. These results highlight the potential of quantum AI methods as an effective and promising approach for the analysis and classification of complex structures such as brain tumors based on gene expression features.


Reviews: Cost efficient gradient boosting

Neural Information Processing Systems

Thus, the paper is similar to the work of Xu et al., 2012. The main differences are the fact that the feature and evaluation costs are input-specific, the evaluation cost depends on the number of tree splits, their optimization approach is different (based on the Taylor expansion around T_{k-1}, as described in the XGBoost paper), and they use best-first growth to grow the trees to a maximum number of splits (instead of a max depth). The authors point out that their setup works either in the case where feature cost dominates or evaluation cost dominates and they show experimental results for these settings.


Scheduling Conservation Designs for Maximum Flexibility via Network Cascade Optimization

Journal of Artificial Intelligence Research

One approach to conserving endangered species is to purchase and protect a set of land parcels in a way that maximizes the expected future population spread. Unfortunately, an ideal set of parcels may have a cost that is beyond the immediate budget constraints and must thus be purchased incrementally. This raises the challenge of deciding how to schedule the parcel purchases in a way that maximizes the flexibility of budget usage while keeping population spread loss in control. In this paper, we introduce a formulation of this scheduling problem that does not rely on knowing the future budgets of an organization. In particular, we consider scheduling purchases in a way that achieves a population spread no less than desired but delays purchases as long as possible. Such schedules offer conservation planners maximum flexibility and use available budgets in the most efficient way. We develop the problem formally as a stochastic optimization problem over a network cascade model describing a commonly used model of population spread. Our solution approach is based on reducing the stochastic problem to a novel variant of the directed Steiner tree problem, which we call the set-weighted directed Steiner graph problem. We show that this problem is computationally hard, motivating the development of a primal-dual algorithm for the problem that computes both a feasible solution and a bound on the quality of an optimal solution. We evaluate the approach on both real and synthetic conservation data with a standard population spread model. The algorithm is shown to produce near optimal results and is much more scalable than more generic off-the-shelf optimizers. Finally, we evaluate a variant of the algorithm to explore the trade-offs between budget savings and population growth.


Scheduling Conservation Designs via Network Cascade Optimization

AAAI Conferences

We introduce the problem of scheduling land purchases to conserve an endangered species in a way that achieves maximum population spread but delays purchases as long as possible, so that conservation planners retain maximum flexibility and use available budgets in the most efficient way. We develop the problem formally as a stochastic optimization problem over a network cascade model describing the population spread, and present a solution approach that reduces the stochastic problem to a novel variant of a Steiner tree problem. We give a primal-dual algorithm for the problem that computes both a feasible solution and a bound on the quality of an optimal solution. Our experiments, using actual conservation data and a standard diffusion model, show that the approach produces near optimal results and is much more scalable than more generic off-the-shelf optimizers.


Technical Note: Towards ROC Curves in Cost Space

arXiv.org Artificial Intelligence

ROC curves and cost curves are two popular ways of visualising classifier performance, finding appropriate thresholds according to the operating condition, and deriving useful aggregated measures such as the area under the ROC curve (AUC) or the area under the optimal cost curve. In this note we present some new findings and connections between ROC space and cost space, by using the expected loss over a range of operating conditions. In particular, we show that ROC curves can be transferred to cost space by means of a very natural way of understanding how thresholds should be chosen, by selecting the threshold such that the proportion of positive predictions equals the operating condition (either in the form of cost proportion or skew). We call these new curves {ROC Cost Curves}, and we demonstrate that the expected loss as measured by the area under these curves is linearly related to AUC. This opens up a series of new possibilities and clarifies the notion of cost curve and its relation to ROC analysis. In addition, we show that for a classifier that assigns the scores in an evenly-spaced way, these curves are equal to the Brier Curves. As a result, this establishes the first clear connection between AUC and the Brier score.