Accuracy
SimpSON: Simplifying Photo Cleanup with Single-Click Distracting Object Segmentation Network
Huynh, Chuong, Zhou, Yuqian, Lin, Zhe, Barnes, Connelly, Shechtman, Eli, Amirghodsi, Sohrab, Shrivastava, Abhinav
In photo editing, it is common practice to remove visual distractions to improve the overall image quality and highlight the primary subject. However, manually selecting and removing these small and dense distracting regions can be a laborious and time-consuming task. In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. Our method surpasses the precision and recall achieved by the traditional method of running panoptic segmentation and then selecting the segments containing the clicks. We also showcase how a transformer-based module can be used to identify more distracting regions similar to the user's click position. Our experiments demonstrate that the model can effectively and accurately segment unknown distracting objects interactively and in groups. By significantly simplifying the photo cleaning and retouching process, our proposed model provides inspiration for exploring rare object segmentation and group selection with a single click.
Deep Variational Lesion-Deficit Mapping
Pombo, Guilherme, Gray, Robert, Nelson, Amy P. K., Foulon, Chris, Ashburner, John, Nachev, Parashkev
Causal mapping of the functional organisation of the human brain requires evidence of \textit{necessity} available at adequate scale only from pathological lesions of natural origin. This demands inferential models with sufficient flexibility to capture both the observable distribution of pathological damage and the unobserved distribution of the neural substrate. Current model frameworks -- both mass-univariate and multivariate -- either ignore distributed lesion-deficit relations or do not model them explicitly, relying on featurization incidental to a predictive task. Here we initiate the application of deep generative neural network architectures to the task of lesion-deficit inference, formulating it as the estimation of an expressive hierarchical model of the joint lesion and deficit distributions conditioned on a latent neural substrate. We implement such deep lesion deficit inference with variational convolutional volumetric auto-encoders. We introduce a comprehensive framework for lesion-deficit model comparison, incorporating diverse candidate substrates, forms of substrate interactions, sample sizes, noise corruption, and population heterogeneity. Drawing on 5500 volume images of ischaemic stroke, we show that our model outperforms established methods by a substantial margin across all simulation scenarios, including comparatively small-scale and noisy data regimes. Our analysis justifies the widespread adoption of this approach, for which we provide an open source implementation: https://github.com/guilherme-pombo/vae_lesion_deficit
Financial misstatement detection: a realistic evaluation
Zavitsanos, Elias, Mavroeidis, Dimitris, Bougiatiotis, Konstantinos, Spyropoulou, Eirini, Loukas, Lefteris, Paliouras, Georgios
In this work, we examine the evaluation process for the task of detecting financial reports with a high risk of containing a misstatement. This task is often referred to, in the literature, as ``misstatement detection in financial reports''. We provide an extensive review of the related literature. We propose a new, realistic evaluation framework for the task which, unlike a large part of the previous work: (a) focuses on the misstatement class and its rarity, (b) considers the dimension of time when splitting data into training and test and (c) considers the fact that misstatements can take a long time to detect. Most importantly, we show that the evaluation process significantly affects system performance, and we analyze the performance of different models and feature types in the new realistic framework.
Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage System
Schlosser, Tobias, Beuth, Frederik, Meyer, Trixy, Kumar, Arunodhayan Sampath, Stolze, Gabriel, Furashova, Olga, Engelmann, Katrin, Kowerko, Danny
In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema (DME), as well as the retinal vein occlusion (RVO). However, in real-world settings, patients often suffer from loss of vision on time scales of years despite therapy, whereas the prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. In this contribution, we present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. The extensive data corpus allows predictive statements of the expected progression of a patient and his or her VA in each of the three diseases. We found out for the disease AMD a significant deterioration of the visual acuity over time. Within our proposed multistage system, we classify the VA progression into the three groups of therapy "winners", "stabilizers", and "losers" (WSL scheme). Our OCT biomarker classification using an ensemble of deep neural networks results in a classification accuracy (F1-score) of over 98 %, enabling us to complete incomplete OCT documentations while allowing us to exploit them for a more precise VA modelling process. Our VA prediction requires at least four VA examinations and optionally OCT biomarkers from the same time period to predict the VA progression within a forecasted time frame, whereas our prediction is currently restricted to IVOM / no therapy. While achieving a prediction accuracy of up to 69 % (macro average F1-score) when considering all three WSL-based progression groups, this corresponds to an improvement by 11.2 % in comparison to our ophthalmic expertise (57.8 %).
Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map
Yang, Yuguang, Guo, Runtang, Wu, Sheng, Wang, Yimi, Zhang, Juan, Gong, Xuan, Zhang, Baochang
Interpretation of deep learning remains a very challenging problem. Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions. Furthermore, existing evaluation protocols often overlook the correlation between interpretability performance and the model's decision quality, which presents a more fundamental issue. This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM), which offers a feature-level interpretation of the model's prediction. Decom-CAM decomposes intermediate activation maps into orthogonal features using singular value decomposition and generates saliency maps by integrating them. The orthogonality of features enables CAM to capture local features and can be used to pinpoint semantic components such as eyes, noses, and faces in the input image, making it more beneficial for deep model interpretation. To ensure a comprehensive comparison, we introduce a new evaluation protocol by dividing the dataset into subsets based on classification accuracy results and evaluating the interpretability performance on each subset separately. Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly by generating more precise saliency maps across all levels of classification accuracy. Combined with our feature-level interpretability approach, this paper could pave the way for a new direction for understanding the decision-making process of deep neural networks.
D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias
Hassan, Sabit, Alikhani, Malihe
Despite recent advancements, NLP models continue to be vulnerable to bias. This bias often originates from the uneven distribution of real-world data and can propagate through the annotation process. Escalated integration of these models in our lives calls for methods to mitigate bias without overbearing annotation costs. While active learning (AL) has shown promise in training models with a small amount of annotated data, AL's reliance on the model's behavior for selective sampling can lead to an accumulation of unwanted bias rather than bias mitigation. However, infusing clustering with AL can overcome the bias issue of both AL and traditional annotation methods while exploiting AL's annotation efficiency. In this paper, we propose a novel adaptive clustering-based active learning algorithm, D-CALM, that dynamically adjusts clustering and annotation efforts in response to an estimated classifier error-rate. Experiments on eight datasets for a diverse set of text classification tasks, including emotion, hatespeech, dialog act, and book type detection, demonstrate that our proposed algorithm significantly outperforms baseline AL approaches with both pretrained transformers and traditional Support Vector Machines. D-CALM showcases robustness against different measures of information gain and, as evident from our analysis of label and error distribution, can significantly reduce unwanted model bias.
An Improved Model Ensembled of Different Hyper-parameter Tuned Machine Learning Algorithms for Fetal Health Prediction
Talukder, Md. Simul Hasan, Akter, Sharmin
Fetal health is a critical concern during pregnancy as it can impact the well-being of both the mother and the baby. Regular monitoring and timely interventions are necessary to ensure the best possible outcomes. While there are various methods to monitor fetal health in the mother's womb, the use of artificial intelligence (AI) can improve the accuracy, efficiency, and speed of diagnosis. In this study, we propose a robust ensemble model called ensemble of tuned Support Vector Machine and ExtraTrees (ETSE) for predicting fetal health. Initially, we employed various data preprocessing techniques such as outlier rejection, missing value imputation, data standardization, and data sampling. Then, seven machine learning (ML) classifiers including Support Vector Machine (SVM), XGBoost (XGB), Light Gradient Boosting Machine (LGBM), Decision Tree (DT), Random Forest (RF), ExtraTrees (ET), and K-Neighbors were implemented. These models were evaluated and then optimized by hyperparameter tuning using the grid search technique. Finally, we analyzed the performance of our proposed ETSE model. The performance analysis of each model revealed that our proposed ETSE model outperformed the other models with 100% precision, 100% recall, 100% F1-score, and 99.66% accuracy. This indicates that the ETSE model can effectively predict fetal health, which can aid in timely interventions and improve outcomes for both the mother and the baby.
GVdoc: Graph-based Visual Document Classification
Mohbat, Fnu, Zaki, Mohammed J., Finegan-Dollak, Catherine, Verma, Ashish
The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set.
Characterizing and Measuring Linguistic Dataset Drift
Chang, Tyler A., Halder, Kishaloy, John, Neha Anna, Vyas, Yogarshi, Benajiba, Yassine, Ballesteros, Miguel, Roth, Dan
NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often used in practice. In this paper, we propose three dimensions of linguistic dataset drift: vocabulary, structural, and semantic drift. These dimensions correspond to content word frequency divergences, syntactic divergences, and meaning changes not captured by word frequencies (e.g. lexical semantic change). We propose interpretable metrics for all three drift dimensions, and we modify past performance prediction methods to predict model performance at both the example and dataset level for English sentiment classification and natural language inference. We find that our drift metrics are more effective than previous metrics at predicting out-of-domain model accuracies (mean 16.8% root mean square error decrease), particularly when compared to popular fine-tuned embedding distances (mean 47.7% error decrease). Fine-tuned embedding distances are much more effective at ranking individual examples by expected performance, but decomposing into vocabulary, structural, and semantic drift produces the best example rankings of all considered model-agnostic drift metrics (mean 6.7% ROC AUC increase).
Discussion of Features for Acoustic Anomaly Detection under Industrial Disturbing Noise in an End-of-Line Test of Geared Motors
Wissbrock, Peter, Pelkmann, David, Richter, Yvonne
In the end-of-line test of geared motors, the evaluation of product qual-ity is important. Due to time constraints and the high diversity of variants, acous-tic measurements are more economical than vibration measurements. However, the acoustic data is affected by industrial disturbing noise. Therefore, the aim of this study is to investigate the robustness of features used for anomaly detection in geared motor end-of-line testing. A real-world dataset with typical faults and acoustic disturbances is recorded by an acoustic array. This includes industrial noise from the production and systematically produced disturbances, used to compare the robustness. Overall, it is proposed to apply features extracted from a log-envelope spectrum together with psychoacoustic features. The anomaly de-tection is done by using the isolation forest or the more universal bagging random miner. Most disturbances can be circumvented, while the use of a hammer or air pressure often causes problems. In general, these results are important for condi-tion monitoring tasks that are based on acoustic or vibration measurements. Fur-thermore, a real-world problem description is presented to improve common sig-nal processing and machine learning tasks.