Performance Analysis
An Impartial Take to the CNN vs Transformer Robustness Contest
Pinto, Francesco, Torr, Philip H. S., Dokania, Puneet K.
Following the surge of popularity of Transformers in Computer Vision, several studies have attempted to determine whether they could be more robust to distribution shifts and provide better uncertainty estimates than Convolutional Neural Networks (CNNs). The almost unanimous conclusion is that they are, and it is often conjectured more or less explicitly that the reason of this supposed superiority is to be attributed to the self-attention mechanism. In this paper we perform extensive empirical analyses showing that recent state-of-the-art CNNs (particularly, ConvNeXt [20]) can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. However, there is no clear winner. Therefore, although it is tempting to state the definitive superiority of one family of architectures over another, they seem to enjoy similar extraordinary performances on a variety of tasks while also suffering from similar vulnerabilities such as texture, background, and simplicity biases.
FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification
Lin, Xiaofeng, Kim, Seungbae, Joo, Jungseock
Existing pruning techniques preserve deep neural networks' overall ability to make correct predictions but could also amplify hidden biases during the compression process. We propose a novel pruning method, Fairness-aware GRAdient Pruning mEthod (FairGRAPE), that minimizes the disproportionate impacts of pruning on different sub-groups. Our method calculates the per-group importance of each model weight and selects a subset of weights that maintain the relative between-group total importance in pruning. The proposed method then prunes network edges with small importance values and repeats the procedure by updating importance values. We demonstrate the effectiveness of our method on four different datasets, FairFace, UTK-Face, CelebA, and ImageNet, for the tasks of face attribute classification where our method reduces the disparity in performance degradation by up to 90% compared to the state-of-the-art pruning algorithms. Our method is substantially more effective in a setting with a high pruning rate (99%).
Open video data sharing in developmental and behavioural science
Marschik, Peter B, Kulvicius, Tomas, Flรผgge, Sarah, Widmann, Claudius, Nielsen-Saines, Karin, Schulte-Rรผther, Martin, Hรผning, Britta, Bรถlte, Sven, Poustka, Luise, Sigafoos, Jeff, Wรถrgรถtter, Florentin, Einspieler, Christa, Zhang, Dajie
Video recording is a widely used method for documenting infant and child behaviours in research and clinical practice. Video data has rarely been shared due to ethical concerns of confidentiality, although the need of shared large-scaled datasets remains increasing. This demand is even more imperative when data-driven computer-based approaches are involved, such as screening tools to complement clinical assessments. To share data while abiding by privacy protection rules, a critical question arises whether efforts at data de-identification reduce data utility? We addressed this question by showcasing the Prechtl's general movements assessment (GMA), an established and globally practised video-based diagnostic tool in early infancy for detecting neurological deficits, such as cerebral palsy. To date, no shared expert-annotated large data repositories for infant movement analyses exist. Such datasets would massively benefit training and recalibration of human assessors and the development of computer-based approaches. In the current study, sequences from a prospective longitudinal infant cohort with a total of 19451 available general movements video snippets were randomly selected for human clinical reasoning and computer-based analysis. We demonstrated for the first time that pseudonymisation by face-blurring video recordings is a viable approach. The video redaction did not affect classification accuracy for either human assessors or computer vision methods, suggesting an adequate and easy-to-apply solution for sharing movement video data. We call for further explorations into efficient and privacy rule-conforming approaches for deidentifying video data in scientific and clinical fields beyond movement assessments. These approaches shall enable sharing and merging stand-alone video datasets into large data pools to advance science and public health.
Classification via score-based generative modelling
In this work, we investigated the application of score-based gradient learning in discriminative and generative classification settings. Score function can be used to characterize data distribution as an alternative to density. It can be efficiently learned via score matching, and used to flexibly generate credible samples to enhance discriminative classification quality, to recover density and to build generative classifiers. We analysed the decision theories involving score-based representations, and performed experiments on simulated and real-world datasets, demonstrating its effectiveness in achieving and improving binary classification performance, and robustness to perturbations, particularly in high dimensions and imbalanced situations.
Algorithmic Fairness in Business Analytics: Directions for Research and Practice
De-Arteaga, Maria, Feuerriegel, Stefan, Saar-Tsechansky, Maytal
The extensive adoption of business analytics (BA) has brought financial gains and increased efficiencies. However, these advances have simultaneously drawn attention to rising legal and ethical challenges when BA inform decisions with fairness implications. As a response to these concerns, the emerging study of algorithmic fairness deals with algorithmic outputs that may result in disparate outcomes or other forms of injustices for subgroups of the population, especially those who have been historically marginalized. Fairness is relevant on the basis of legal compliance, social responsibility, and utility; if not adequately and systematically addressed, unfair BA systems may lead to societal harms and may also threaten an organization's own survival, its competitiveness, and overall performance. This paper offers a forward-looking, BA-focused review of algorithmic fairness. We first review the state-of-the-art research on sources and measures of bias, as well as bias mitigation algorithms. We then provide a detailed discussion of the utility-fairness relationship, emphasizing that the frequent assumption of a trade-off between these two constructs is often mistaken or short-sighted. Finally, we chart a path forward by identifying opportunities for business scholars to address impactful, open challenges that are key to the effective and responsible deployment of BA.
A pitfall for machine learning methods aiming to predict across cell types - Genome Biology
Machine learning has been applied to a wide variety of genomic prediction problems, such as predicting transcription factor binding, identifying active cis-regulatory elements, constructing gene regulatory networks, and predicting the effects of single nucleotide polymorphisms. The inputs to these models typically include some combination of nucleotide sequence and signals from epigenomics assays. Given such data, the most common approach to evaluating predictive models is a "cross-chromosomal" strategy, which involves training a separate model for each cell type and partitioning genomic loci into some number of folds for cross-validation (Figure 1a). Typically, the genomic loci are split by chromosome. This strategy has been employed for models that predict gene expression [1โ3], elements of chromatin architecture [4, 5], transcription factor binding [6, 7], and cis-regulatory elements [8โ13]. Although the cross-chromosomal approach measures how well the model generalizes to new genomic loci, it does not measure how well the model generalizes to new cell types.
Array
Confusion Matrix is mainly used machine language and deep learning field to evaluate the perfomance of model,by showing all its predicted value in a table. It is mainly used to evaluate classification model(classifier) by calculating precision,recall and f1 score using confusion matrix. As name suggest it create confusion in terminalogy which is used in table.when In the dog vs cat example we have designed and trained model to predict two values which is either dog or cat . To create confusion matrix we need to take one item at a time .In the our case we are taking first dog then we will create four values that is TP,TN,FP,FN.
Uncertainty quantification for predictions of atomistic neural networks
Vazquez-Salazar, Luis Itza, Boittier, Eric D., Meuwly, M.
The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model was evaluated with different metrics to quantify calibration, quality of predictions, and whether prediction error and the predicted uncertainty can be correlated. The results from training on the QM9 database and evaluating data from the test set within and outside the distribution indicate that error and uncertainty are not linearly related. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes - e.g. double bond migration in two otherwise identical molecules - are small. The model was then applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space combined with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. This underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the approach put forward can be used for information-based improvement of chemical databases for target applications through active learning optimization.
Efficient Graph-Friendly COCO Metric Computation for Train-Time Model Evaluation
Evaluating the COCO mean average precision (MaP) and COCO recall metrics as part of the static computation graph of modern deep learning frameworks poses a unique set of challenges. These challenges include the need for maintaining a dynamic-sized state to compute mean average precision, reliance on global dataset-level statistics to compute the metrics, and managing differing numbers of bounding boxes between images in a batch. As a consequence, it is common practice for researchers and practitioners to evaluate COCO metrics as a post training evaluation step. With a graph-friendly algorithm to compute COCO Mean Average Precision and recall, these metrics could be evaluated at training time, improving visibility into the evolution of the metrics through training curve plots, and decreasing iteration time when prototyping new model versions. Our contributions include an accurate approximation algorithm for Mean Average Precision, an open source implementation of both COCO mean average precision and COCO recall, extensive numerical benchmarks to verify the accuracy of our implementations, and an open-source training loop that include train-time evaluation of mean average precision and recall.
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
Gao, Zijian, Liu, Jingyu, Sun, Weiqi, Chen, Sheng, Chang, Dedan, Zhao, Lili
Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head. With the success of both visual and textual representation learning, transformerbased encoders and fusion methods have also been adopted in the field of video-text retrieval. In this paper, We propose a new CLIP-based framework called CLIP2TV, which consists of a video-text alignment module and a video-text matching module. The two modules are trained end-toend in a coordinated manner, and boost the performance to each other. Moreover, to address the impairment brought by data noise, especially false negatives introduced by vague description in some datasets, we propose similarity distillation to alleviate the problem. Extensive experimental results on various datasets validate the effectiveness of the proposed methods. Finally, on common datasets of various length of video clips, CLIP2TV achieves better or competitive results towards previous SOTA methods.