Goto

Collaborating Authors

 Performance Analysis


Evaluation Measures for Quantification: An Axiomatic Approach

arXiv.org Artificial Intelligence

Quantification is the task of estimating, given a set $\sigma$ of unlabelled items and a set of classes $\mathcal{C}=\{c_{1}, \ldots, c_{|\mathcal{C}|}\}$, the prevalence (or `relative frequency') in $\sigma$ of each class $c_{i}\in \mathcal{C}$. While quantification may in principle be solved by classifying each item in $\sigma$ and counting how many such items have been labelled with $c_{i}$, it has long been shown that this `classify and count' (CC) method yields suboptimal quantification accuracy. As a result, quantification is no longer considered a mere byproduct of classification, and has evolved as a task of its own. While the scientific community has devoted a lot of attention to devising more accurate quantification methods, it has not devoted much to discussing what properties an \emph{evaluation measure for quantification} (EMQ) should enjoy, and which EMQs should be adopted as a result. This paper lies down a number of interesting properties that an EMQ may or may not enjoy, discusses if (and when) each of these properties is desirable, surveys the EMQs that have been used so far, and discusses whether they enjoy or not the above properties. As a result of this investigation, some of the EMQs that have been used in the literature turn out to be severely unfit, while others emerge as closer to what the quantification community actually needs. However, a significant result is that no existing EMQ satisfies all the properties identified as desirable, thus indicating that more research is needed in order to identify (or synthesize) a truly adequate EMQ.


One-shot Learning for iEEG Seizure Detection Using End-to-end Binary Operations: Local Binary Patterns with Hyperdimensional Computing

arXiv.org Machine Learning

This paper presents an efficient binarized algorithm for both learning and classification of human epileptic seizures from intracranial electroencephalography (iEEG). The algorithm combines local binary patterns with brain-inspired hyperdimensional computing to enable end-to-end learning and inference with binary operations. The algorithm first transforms iEEG time series from each electrode into local binary pattern codes. Then atomic high-dimensional binary vectors are used to construct composite representations of seizures across all electrodes. For the majority of our patients (10 out of 16), the algorithm quickly learns from one or two seizures (i.e., one-/few-shot learning) and perfectly generalizes on 27 further seizures. For other patients, the algorithm requires three to six seizures for learning. Overall, our algorithm surpasses the state-of-the-art methods for detecting 65 novel seizures with higher specificity and sensitivity, and lower memory footprint.


A Bandit Approach to Multiple Testing with False Discovery Control

arXiv.org Machine Learning

We propose an adaptive sampling approach for multiple testing which aims to maximize statistical power while ensuring anytime false discovery control. We consider $n$ distributions whose means are partitioned by whether they are below or equal to a baseline (nulls), versus above the baseline (actual positives). In addition, each distribution can be sequentially and repeatedly sampled. Inspired by the multi-armed bandit literature, we provide an algorithm that takes as few samples as possible to exceed a target true positive proportion (i.e. proportion of actual positives discovered) while giving anytime control of the false discovery proportion (nulls predicted as actual positives). Our sample complexity results match known information theoretic lower bounds and through simulations we show a substantial performance improvement over uniform sampling and an adaptive elimination style algorithm. Given the simplicity of the approach, and its sample efficiency, the method has promise for wide adoption in the biological sciences, clinical testing for drug discovery, and online A/B/n testing problems.


Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews

arXiv.org Machine Learning

We mathematically compare three competing definitions of group-level nondiscrimination: demographic parity, equalized odds, and calibration. Using the theoretical framework of Friedler et al., we study the properties of each definition under various worldviews, which are assumptions about how, if at all, the observed data is biased. We prove that different worldviews call for different definitions of fairness, and we specify when it is appropriate to use demographic parity and equalized odds. In addition, we argue that calibration is unsuitable for the purpose of ensuring nondiscrimination. Finally, we define a worldview that is more realistic than the previously considered ones, and we introduce a new notion of fairness that is suitable for this worldview.


Causal Discovery by Telling Apart Parents and Children

arXiv.org Machine Learning

We consider the problem of inferring the directed, causal graph from observational data, assuming no hidden confounders. We take an information theoretic approach, and make three main contributions. First, we show how through algorithmic information theory we can obtain SCI, a highly robust, effective and computationally efficient test for conditional independence---and show it outperforms the state of the art when applied in constraint-based inference methods such as stable PC. Second, building upon on SCI, we show how to tell apart the parents and children of a given node based on the algorithmic Markov condition. We give the Climb algorithm to efficiently discover the directed, causal Markov blanket---and show it is at least as accurate as inferring the global network, while being much more efficient. Last, but not least, we detail how we can use the Climb score to direct those edges that state of the art causal discovery algorithms based on PC or GES leave undirected---and show this improves their precision, recall and F1 scores by up to 20%.


A Direct Approach for Sparse Quadratic Discriminant Analysis

arXiv.org Machine Learning

Quadratic discriminant analysis (QDA) is a standard tool for classification due to its simplicity and flexibility. Because the number of its parameters scales quadratically with the number of the variables, QDA is not practical, however, when the dimensionality is relatively large. To address this, we propose a novel procedure named DA-QDA for QDA in analyzing high-dimensional data. Formulated in a simple and coherent framework, DA-QDA aims to directly estimate the key quantities in the Bayes discriminant function including quadratic interactions and a linear index of the variables for classification. Under appropriate sparsity assumptions, we establish consistency results for estimating the interactions and the linear index, and further demonstrate that the misclassification rate of our procedure converges to the optimal Bayes risk, even when the dimensionality is exponentially high with respect to the sample size. An efficient algorithm based on the alternating direction method of multipliers (ADMM) is developed for finding interactions, which is much faster than its competitor in the literature. The promising performance of DA-QDA is illustrated via extensive simulation studies and the analysis of four real datasets.


Cross validation residuals for generalised least squares and other correlated data models

arXiv.org Machine Learning

Cross validation residuals are well known for the ordinary least squares model. Here leave-M-out cross validation is extended to generalised least squares. The relationship between cross validation residuals and Cook's distance is demonstrated, in terms of an approximation to the difference in the generalised residual sum of squares for a model fit to all the data (training and test) and a model fit to a reduced dataset (training data only). For generalised least squares, as for ordinary least squares, there is no need to refit the model to reduced size datasets as all the values for K fold cross validation are available after fitting the model to all the data.


Predicting Smoking Events with a Time-Varying Semi-Parametric Hawkes Process Model

arXiv.org Machine Learning

Health risks from cigarette smoking -- the leading cause of preventable death in the United States -- can be substantially reduced by quitting. Although most smokers are motivated to quit, the majority of quit attempts fail. A number of studies have explored the role of self-reported symptoms, physiologic measurements, and environmental context on smoking risk, but less work has focused on the temporal dynamics of smoking events, including daily patterns and related nicotine effects. In this work, we examine these dynamics and improve risk prediction by modeling smoking as a self-triggering process, in which previous smoking events modify current risk. Specifically, we fit smoking events self-reported by 42 smokers to a time-varying semi-parametric Hawkes process (TV-SPHP) developed for this purpose. Results show that the TV-SPHP achieves superior prediction performance compared to related and existing models, with the incorporation of time-varying predictors having greatest benefit over longer prediction windows. Moreover, the impact function illustrates previously unknown temporal dynamics of smoking, with possible connections to nicotine metabolism to be explored in future work through a randomized study design. By more effectively predicting smoking events and exploring a self-triggering component of smoking risk, this work supports development of novel or improved cessation interventions that aim to reduce death from smoking.


Online local pool generation for dynamic classifier selection: an extended version

arXiv.org Artificial Intelligence

Dynamic Classifier Selection (DCS) techniques have difficulty in selecting the most competent classifier in a pool, even when its presence is assured. Since the DCS techniques rely only on local data to estimate a classifiers competence, the manner in which the pool is generated could affect the choice of the best classifier for a given instance. That is, the global perspective in which pools are generated may not help the DCS techniques in selecting a competent classifier for instances that are likely to be misclassified. Thus, it is proposed in this work an online pool generation method that produces a locally accurate pool for test samples in difficult regions of the feature space. The difficulty of a given area is determined by the estimated classification difficulty of the instances in it. That way, by using classifiers that were generated in a local scope, it could be easier for the DCS techniques to select the best one for those instances they would most probably misclassify. For the query samples surrounded by easy instances, a simple nearest neighbors rule is used in the proposed method. In the extended version of this work, a deep analysis on the correlation between instance hardness and the performance of DCS techniques is presented. An instance hardness measure that conveys the degree of local class overlap near a given sample is then used to identify in which cases the local pool is used in the proposed scheme. Experimental results show that the DCS techniques were more able to select the most competent classifier for difficult instances when using the proposed local pool than when using a globally generated pool. Moreover, the proposed technique yielded significantly greater recognition rates in comparison to a Bagging-generated pool and two other global generation schemes for all DCS techniques evaluated. The performance of the proposed technique was also significantly superior to three state-of-the-art classification models and was statistically equivalent to five of them. Furthermore, an extended analysis on the computational complexity of the proposed technique and of several DS techniques is presented in this version. We also provide the implementation of the proposed technique using the DESLib library on GitHub. Keywords: Selection Multiple Classifier Systems, Instance Hardness, Pool Generation, Dynamic Classifier 1. Introduction Multiple Classifier Systems (MCS) aim to improve the overall performance of a pattern recognition system by combining numerous base classifiers [1, 2, 3]. An MCS contains three phases [4]: (1) Generation, (2) Selection and (3) Integration. In the first phase, a pool of classifiers is generated using the training data.


An Analysis of Hierarchical Text Classification Using Word Embeddings

arXiv.org Artificial Intelligence

Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations---fastText, XGBoost, SVM, and Keras' CNN---and noticeable word embeddings generation methods---GloVe, word2vec, and fastText---with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.