Performance Analysis
Australia: Omicron death, false negative COVID results
Australia has reported its first confirmed death from the new Omicron variant of COVID-19 amid another surge in daily infections. The authorities, however, refrained from imposing new restrictions, saying hospital admission rates remained low. The death on Monday of a man in his 80s with underlying health conditions marked a grim milestone for Australia that has had to reverse some parts of a staged reopening after nearly two years of stop-start lockdowns due to the fresh outbreak. Omicron, which health experts say appears more contagious but less virulent than previous strains, began to spread in the country just as it lifted restrictions on most domestic borders and allowed Australians to return from overseas without quarantine, driving case numbers to the highest levels since the start of the pandemic. The authorities gave no additional details about the Omicron death, except to say that the man caught the virus at an aged care facility and died in a Sydney hospital.
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels
Zhang, Youcai, Cheng, Yuhao, Huang, Xinyu, Wen, Fei, Feng, Rui, Li, Yaqian, Guo, Yandong
Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods via robust loss design based on an observation that a model can identify missing labels during training with a high precision. The first is a novel robust loss for negatives, namely the Hill loss, which re-weights negatives in the shape of a hill to alleviate the effect of false negatives. The second is a self-paced loss correction (SPLC) method, which uses a loss derived from the maximum likelihood criterion under an approximate distribution of missing labels. Comprehensive experiments on a vast range of multi-label image classification datasets demonstrate that our methods can remarkably boost the performance of MLML and achieve new state-of-the-art loss functions in MLML.
Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Kebe, Gaoussou Youssouf, Richards, Luke E., Raff, Edward, Ferraro, Francis, Matuszek, Cynthia
Learning to understand grounded language, which connects natural language to percepts, is a critical research area. Prior work in grounded language acquisition has focused primarily on textual inputs. In this work we demonstrate the feasibility of performing grounded language acquisition on paired visual percepts and raw speech inputs. This will allow interactions in which language about novel tasks and environments is learned from end users, reducing dependence on textual inputs and potentially mitigating the effects of demographic bias found in widely available speech recognition systems. We leverage recent work in self-supervised speech representation models and show that learned representations of speech can make language grounding systems more inclusive towards specific groups while maintaining or even increasing general performance.
Learning from Disagreement: A Survey
Uma, Alexandra N., Fornaciari, Tommaso, Hovy, Dirk, Paun, Silviu, Plank, Barbara, Poesio, Massimo
Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.
Evaluating classification models with Accuracy, Precision and Recall.
Hope you are doing well. We're in December and is time to review the year that passed and evaluate how well we performed. And of course as Machine Learning Engineers we would not be able to do that without also evaluating our classification algorithms! So... Grab a cup of coffee because we are going to talk about four important metrics for evaluating our models in this article. Suppose we are a classification algorithm that is predicting whether or not is going to rain.
Time Series Data Mining Algorithms Towards Scalable and Real-Time Behavior Monitoring
In recent years, there have been unprecedented technological advances in sensor technology, and sensors have become more affordable than ever. Thus, sensor-driven data collection is increasingly becoming an attractive and practical option for researchers around the globe. Such data is typically extracted in the form of time series data, which can be investigated with data mining techniques to summarize behaviors of a range of subjects including humans and animals. While enabling cheap and mass collection of data, continuous sensor data recording results in datasets which are big in size and volume, which are challenging to process and analyze with traditional techniques in a timely manner. Such collected sensor data is typically extracted in the form of time series data. There are two main approaches in the literature, namely, shape-based classification and feature-based classification. Shape-based classification determines the best class according to a distance measure. Feature-based classification, on the other hand, measures properties of the time series and finds the best class according to the set of features defined for the time series. In this dissertation, we demonstrate that neither of the two techniques will dominate for some problems, but that some combination of both might be the best. In other words, on a single problem, it might be possible that one of the techniques is better for one subset of the behaviors, and the other technique is better for another subset of behaviors. We introduce a hybrid algorithm to classify behaviors, using both shape and feature measures, in weakly labeled time series data collected from sensors to quantify specific behaviors performed by the subject. We demonstrate that our algorithm can robustly classify real, noisy, and complex datasets, based on a combination of shape and features, and tested our proposed algorithm on real-world datasets.
Explainable Signature-based Machine Learning Approach for Identification of Faults in Grid-Connected Photovoltaic Systems
The transformation of conventional power networks into smart grids with the heavy penetration level of renewable energy resources, particularly grid-connected Photovoltaic (PV) systems, has increased the need for efficient fault identification systems. Malfunctioning any single component in grid-connected PV systems may lead to grid instability and other serious consequences, showing that a reliable fault identification system is the utmost requirement for ensuring operational integrity. Therefore, this paper presents a novel fault identification approach based on statistical signatures of PV operational states. These signatures are unique because each fault has a different nature and distinctive impact on the electrical system. Thus, the Random Forest Classifier trained on these extracted signatures showed 100% accuracy in identifying all types of faults. Furthermore, the performance comparison of the proposed framework with other Machine Learning classifiers depicts its credibility. Moreover, to elevate user trust in the predicted outcomes, SHAP (Shapley Additive Explanation) was utilized during the training phase to extract a complete model response (global explanation). This extracted global explanation can help in the assessment of predicted outcomes credibility by decoding each prediction in terms of features contribution. Hence, the proposed explainable signature-based fault identification technique is highly credible and fulfills all the requirements of smart grids.
Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems
The accuracy of binary classification systems is defined as the proportion of correct predictions - both positive and negative - made by a classification model or computational algorithm. A value between 0 (no accuracy) and 1 (perfect accuracy), the accuracy of a classification model is dependent on several factors, notably: the classification rule or algorithm used, the intrinsic characteristics of the tool used to do the classification, and the relative frequency of the elements being classified. Several accuracy metrics exist, each with its own advantages in different classification scenarios. In this manuscript, we show that relative to a perfect accuracy of 1, the positive prevalence threshold ($\phi_e$), a critical point of maximum curvature in the precision-prevalence curve, bounds the $F{_{\beta}}$ score between 1 and 1.8/1.5/1.2 for $\beta$ values of 0.5/1.0/2.0, respectively; the $F_1$ score between 1 and 1.5, and the Fowlkes-Mallows Index (FM) between 1 and $\sqrt{2} \approx 1.414$. We likewise describe a novel $negative$ prevalence threshold ($\phi_n$), the level of sharpest curvature for the negative predictive value-prevalence curve, such that $\phi_n$ $>$ $\phi_e$. The area between both these thresholds bounds the Matthews Correlation Coefficient (MCC) between $\sqrt{2}/2$ and $\sqrt{2}$. Conversely, the ratio of the maximum possible accuracy to that at any point below the prevalence threshold, $\phi_e$, goes to infinity with decreasing prevalence. Though applications are numerous, the ideas herein discussed may be used in computational complexity theory, artificial intelligence, and medical screening, amongst others. Where computational time is a limiting resource, attaining the prevalence threshold in binary classification systems may be sufficient to yield levels of accuracy comparable to that under maximum prevalence.
Application of Markov Structure of Genomes to Outlier Identification and Read Classification
Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel
That the sequential structure of genomes is important has been known since the discovery of DNA. In this paper we employ a statistics and stochastic process perspective on triplets of successive bases to address two important applications: identifying outliers in genome databases, and classifying reads in the metagenomic context of reference-guided assembly. From this stochastic process perspective, triplets are a second-order Markov chain specified by the distribution of each base conditional on its two immediate predecessors. To be sure, studying genomes via base sequence distributions is not novel. Previous papers have addressed genome signatures (Karlin et al., 1997; Campbell et al., 1999; Takashi et al., 2003), as well as frequentist (Rosen et al., 2008) and Bayesian (Wang et al., 2007) approaches to classification problems.
There is an elephant in the room: Towards a critique on the use of fairness in biometrics
Valdivia, Ana, Corbera-Serrajòrdia, Júlia, Swianiewicz, Aneta
In 2019, the UK's Immigration and Asylum Chamber of the Upper Tribunal dismissed an asylum appeal basing the decision on the output of a biometric system, alongside other discrepancies. The fingerprints of the asylum seeker were found in a biometric database which contradicted the appellant's account. The Tribunal found this evidence unequivocal and denied the asylum claim. Nowadays, the proliferation of biometric systems is shaping public debates around its political, social and ethical implications. Yet whilst concerns towards the racialised use of this technology for migration control have been on the rise, investment in the biometrics industry and innovation is increasing considerably. Moreover, fairness has also been recently adopted by biometrics to mitigate bias and discrimination on biometrics. However, algorithmic fairness cannot distribute justice in scenarios which are broken or intended purpose is to discriminate, such as biometrics deployed at the border. In this paper, we offer a critical reading of recent debates about biometric fairness and show its limitations drawing on research in fairness in machine learning and critical border studies. Building on previous fairness demonstrations, we prove that biometric fairness criteria are mathematically mutually exclusive. Then, the paper moves on illustrating empirically that a fair biometric system is not possible by reproducing experiments from previous works. Finally, we discuss the politics of fairness in biometrics by situating the debate at the border. We claim that bias and error rates have different impact on citizens and asylum seekers. Fairness has overshadowed the elephant in the room of biometrics, focusing on the demographic biases and ethical discourses of algorithms rather than examine how these systems reproduce historical and political injustices.