Performance Analysis
Promoting Fairness through Hyperparameter Optimization
Cruz, André F., Saleiro, Pedro, Belém, Catarina, Soares, Carlos, Bizarro, Pedro
Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development and deployment costs. This work explores, in the context of a real-world fraud detection application, the unfairness that emerges from traditional ML model development, and how to mitigate it with a simple and easily deployed intervention: fairness-aware hyperparameter optimization (HO). We propose and evaluate fairness-aware variants of three popular HO algorithms: Fair Random Search, Fair TPE, and Fairband. Our method enables practitioners to adapt pre-existing business operations to accommodate fairness objectives in a frictionless way and with controllable fairness-accuracy trade-offs. Additionally, it can be coupled with existing bias reduction techniques to tune their hyperparameters. We validate our approach on a real-world bank account opening fraud use case, as well as on three datasets from the fairness literature. Results show that, without extra training cost, it is feasible to find models with 111% average fairness increase and just 6% decrease in predictive accuracy, when compared to standard fairness-blind HO.
Can I Solve It? Identifying APIs Required to Complete OSS Task
Santos, Fabio, Wiese, Igor, Trinkenreich, Bianca, Steinmacher, Igor, Sarma, Anita, Gerosa, Marco
Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.
Anomaly detection using principles of human perception
In the fields of statistics and unsupervised machine learning a fundamental and well-studied problem is anomaly detection. Although anomalies are difficult to define, many algorithms have been proposed. Underlying the approaches is the nebulous understanding that anomalies are rare, unusual or inconsistent with the majority of data. The present work gives a philosophical approach to clearly define anomalies and to develop an algorithm for their efficient detection with minimal user intervention. Inspired by the Gestalt School of Psychology and the Helmholtz principle of human perception, the idea is to assume anomalies are observations that are unexpected to occur with respect to certain groupings made by the majority of the data. Thus, under appropriate random variable modelling anomalies are directly found in a set of data under a uniform and independent random assumption of the distribution of constituent elements of the observations; anomalies correspond to those observations where the expectation of occurrence of the elements in a given view is $<1$. Starting from fundamental principles of human perception an unsupervised anomaly detection algorithm is developed that is simple, real-time and parameter-free. Experiments suggest it as the prime choice for univariate data and it shows promising performance on the detection of global anomalies in multivariate data.
What is the Vocabulary of Flaky Tests? An Extended Replication
Camara, B. H. P., Silva, M. A. G., Endo, A. T., Vergilio, S. R.
Software systems have been continuously evolved and delivered with high quality due to the widespread adoption of automated tests. A recurring issue hurting this scenario is the presence of flaky tests, a test case that may pass or fail non-deterministically. A promising, but yet lacking more empirical evidence, approach is to collect static data of automated tests and use them to predict their flakiness. In this paper, we conducted an empirical study to assess the use of code identifiers to predict test flakiness. To do so, we first replicate most parts of the previous study of Pinto~et~al.~(MSR~2020). This replication was extended by using a different ML Python platform (Scikit-learn) and adding different learning algorithms in the analyses. Then, we validated the performance of trained models using datasets with other flaky tests and from different projects. We successfully replicated the results of Pinto~et~al.~(2020), with minor differences using Scikit-learn; different algorithms had performance similar to the ones used previously. Concerning the validation, we noticed that the recall of the trained models was smaller, and classifiers presented a varying range of decreases. This was observed in both intra-project and inter-projects test flakiness prediction.
Interpretable ML-driven Strategy for Automated Trading Pattern Extraction
Sokolovsky, Artur, Arnaboldi, Luca, Bacardit, Jaume, Gross, Thomas
Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome.
MogFace: Rethinking Scale Augmentation on the Face Detector
Liu, Yang, Wang, Fei, Sun, Baigui, Li, Hao
Face detector frequently confronts extreme scale variance challenge. The famous solutions are Multi-scale training, Data-anchor-sampling and Random crop strategy. In this paper, we indicate 2 significant elements to resolve extreme scale variance problem by investigating the difference among the previous solutions, including the fore-ground and back-ground information of an image and the scale information. However, current excellent solutions can only utilize the former information while neglecting to absorb the latter one effectively. In order to help the detector utilize the scale information efficiently, we analyze the relationship between the detector performance and the scale distribution of the training data. Based on this analysis, we propose a Selective Scale Enhancement (SSE) strategy which can assimilate these two information efficiently and simultaneously. Finally, our method achieves state-of-the-art detection performance on all common face detection benchmarks, including AFW, PASCAL face, FDDB and Wider Face datasets. Note that our result achieves six champions on the Wider Face dataset.
Feature Selection for Imbalanced Data with Deep Sparse Autoencoders Ensemble
Massi, Michela C., Ieva, Francesca, Gasperoni, Francesca, Paganoni, Anna Maria
Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class observations. This need can be addressed by Feature Selection (FS), that offers several further advantages, s.a. decreasing computational costs, aiding inference and interpretability. However, traditional FS techniques may become sub-optimal in the presence of strongly imbalanced data. To achieve FS advantages in this setting, we propose a filtering FS algorithm ranking feature importance on the basis of the Reconstruction Error of a Deep Sparse AutoEncoders Ensemble (DSAEE). We use each DSAE trained only on majority class to reconstruct both classes. From the analysis of the aggregated Reconstruction Error, we determine the features where the minority class presents a different distribution of values w.r.t. the overrepresented one, thus identifying the most relevant features to discriminate between the two. We empirically demonstrate the efficacy of our algorithm in several experiments on high-dimensional datasets of varying sample size, showcasing its capability to select relevant and generalizable features to profile and classify minority class, outperforming other benchmark FS methods. We also briefly present a real application in radiogenomics, where the methodology was applied successfully.
Prediction of lung and colon cancer through analysis of histopathological images by utilizing Pre-trained CNN models with visualization of class activation and saliency maps
Colon and Lung cancer is one of the most perilous and dangerous ailments that individuals are enduring worldwide and has become a general medical problem. To lessen the risk of death, a legitimate and early finding is particularly required. In any case, it is a truly troublesome task that depends on the experience of histopathologists. If a histologist is under-prepared it may even hazard the life of a patient. As of late, deep learning has picked up energy, and it is being valued in the analysis of Medical Imaging. This paper intends to utilize and alter the current pre-trained CNN-based model to identify lung and colon cancer utilizing histopathological images with better augmentation techniques. In this paper, eight distinctive Pre-trained CNN models, VGG16, NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, Xception, MobileNet, and DenseNet169 are trained on LC25000 dataset. The model performances are assessed on precision, recall, f1score, accuracy, and auroc score. The results exhibit that all eight models accomplished noteworthy results ranging from 96% to 100% accuracy. Subsequently, GradCAM and SmoothGrad are also used to picture the attention images of Pre-trained CNN models classifying malignant and benign images.
SSD: A Unified Framework for Self-Supervised Outlier Detection
Sehwag, Vikash, Chiang, Mung, Mittal, Prateek
We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easily accessible for many applications, the most compelling approach is to develop detectors based on only unlabeled in-distribution data. However, we observe that most existing detectors based on unlabeled data perform poorly, often equivalent to a random prediction. In contrast, existing state-of-the-art OOD detectors achieve impressive performance but require access to fine-grained data labels for supervised training. We propose SSD, an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection in the feature space. We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin. Additionally, SSD even achieves performance on par, and sometimes even better, with supervised training based detectors. Finally, we expand our detection framework with two key extensions. First, we formulate few-shot OOD detection, in which the detector has access to only one to five samples from each class of the targeted OOD dataset. Second, we extend our framework to incorporate training data labels, if available. We find that our novel detection framework based on SSD displays enhanced performance with these extensions, and achieves state-of-the-art performance. Our code is publicly available at https://github.com/inspire-group/SSD.
Fairness Perceptions of Algorithmic Decision-Making: A Systematic Review of the Empirical Literature
Starke, Christopher, Baleis, Janine, Keller, Birte, Marcinkowski, Frank
Algorithmic decision-making (ADM) increasingly shapes people's daily lives. Given that such autonomous systems can cause severe harm to individuals and social groups, fairness concerns have arisen. A human-centric approach demanded by scholars and policymakers requires taking people's fairness perceptions into account when designing and implementing ADM. We provide a comprehensive, systematic literature review synthesizing the existing empirical insights on perceptions of algorithmic fairness from 39 empirical studies spanning multiple domains and scientific disciplines. Through thorough coding, we systemize the current empirical literature along four dimensions: (a) algorithmic predictors, (b) human predictors, (c) comparative effects (human decision-making vs. algorithmic decision-making), and (d) consequences of ADM. While we identify much heterogeneity around the theoretical concepts and empirical measurements of algorithmic fairness, the insights come almost exclusively from Western-democratic contexts. By advocating for more interdisciplinary research adopting a society-in-the-loop framework, we hope our work will contribute to fairer and more responsible ADM.