Accuracy
Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification
With the abundance of industrial datasets, imbalanced classification has become a common problem in several application domains. Oversampling is an effective method to solve imbalanced classification. One of the main challenges of the existing oversampling methods is to accurately label the new synthetic samples. Inaccurate labels of the synthetic samples would distort the distribution of the dataset and possibly worsen the classification performance. This paper introduces the idea of weakly supervised learning to handle the inaccurate labeling of synthetic samples caused by traditional oversampling methods. Graph semi-supervised SMOTE is developed to improve the credibility of the synthetic samples' labels. In addition, we propose cost-sensitive neighborhood components analysis for high dimensional datasets and bootstrap based ensemble framework for highly imbalanced datasets. The proposed method has achieved good classification performance on 8 synthetic datasets and 3 real-world datasets, especially for high imbalance and high dimensionality problems. The average performances and robustness are better than the benchmark methods.
GLOD: Gaussian Likelihood Out of Distribution Detector
Amit, Guy, Levy, Moshe, Rosenberg, Ishai, Shabtai, Asaf, Elovici, Yuval
Discriminative deep neural networks (DNNs) do well at classifying input associated with the classes they have been trained on. However, out-of-distribution (OOD) input poses a great challenge to such models and consequently represents a major risk when these models are used in safety-critical systems. In the last two years, extensive research has been performed in the domain of OOD detection. This research has relied mainly on training the model with OOD data or requiring additional computation for OOD detection. Such methods may not be applicable in many real world use cases. In this paper, we propose GLOD -- Gaussian likelihood out of distribution detector -- an extended DNN classifier capable of efficiently detecting OOD samples with no additional runtime overhead and without auxiliary training data. GLOD uses a layer that models the Gaussian density function of the trained classes. The layer outputs are used to estimate a Log-Likelihood Ratio which is employed to detect OOD samples. We evaluate GLOD's detection performance on SVHN, CIFAR-10 and CIFAR-100.
LOGAN: Local Group Bias Detection by Clustering
Machine learning techniques have been widely used in natural language processing (NLP). However, as revealed by many recent studies, machine learning models often inherit and amplify the societal biases in data. Various metrics have been proposed to quantify biases in model predictions. In particular, several of them evaluate disparity in model performance between protected groups and advantaged groups in the test corpus. However, we argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model. In fact, a model with similar aggregated performance between different groups on the entire data may behave differently on instances in a local region. To analyze and detect such local bias, we propose LOGAN, a new bias detection technique based on clustering. Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region and allows us to better analyze the biases in model predictions.
Semantic Evaluation for Text-to-SQL with Distilled Test Suites
Zhong, Ruiqi, Yu, Tao, Klein, Dan
We propose test suite accuracy to approximate semantic accuracy for Text-to-SQL models. Our method distills a small test suite of databases that achieves high code coverage for the gold query from a large number of randomly generated databases. At evaluation time, it computes the denotation accuracy of the predicted queries on the distilled test suite, hence calculating a tight upper-bound for semantic accuracy efficiently. We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples. In contrast, the current Spider metric leads to a 2.5% false negative rate on average and 8.1% in the worst case, indicating that test suite accuracy is needed. Our implementation, along with distilled test suites for eleven Text-to-SQL datasets, is publicly available.
AI Can Detect COVID-19 in the Lungs Like a Virtual Physician, New Study Shows
A University of Central Florida researcher is part of a new study showing that artificial intelligence can be nearly as accurate as a physician in diagnosing COVID-19 in the lungs. The study, recently published in Nature Communications, shows the new technique can also overcome some of the challenges of current testing. Researchers demonstrated that an AI algorithm could be trained to classify COVID-19 pneumonia in computed tomography (CT) scans with up to 90 percent accuracy, as well as correctly identify positive cases 84 percent of the time and negative cases 93 percent of the time. CT scans offer a deeper insight into COVID-19 diagnosis and progression as compared to the often-used reverse transcription-polymerase chain reaction, or RT-PCR, tests. These tests have high false negative rates, delays in processing and other challenges.
Automatic CAD-RADS Scoring Using Deep Learning
Denzinger, Felix, Wels, Michael, Breininger, Katharina, Gülsün, Mehmet A., Schöbinger, Max, André, Florian, Buß, Sebastian, Görich, Johannes, Sühling, Michael, Maier, Andreas
Coronary CT angiography (CCTA) has established its role as a non-invasive modality for the diagnosis of coronary artery disease (CAD). The CAD-Reporting and Data System (CAD-RADS) has been developed to standardize communication and aid in decision making based on CCTA findings. The CAD-RADS score is determined by manual assessment of all coronary vessels and the grading of lesions within the coronary artery tree. We propose a bottom-up approach for fully-automated prediction of this score using deep-learning operating on a segment-wise representation of the coronary arteries. The method relies solely on a prior fully-automated centerline extraction and segment labeling and predicts the segment-wise stenosis degree and the overall calcification grade as auxiliary tasks in a multi-task learning setup. We evaluate our approach on a data collection consisting of 2,867 patients. On the task of identifying patients with a CAD-RADS score indicating the need for further invasive investigation our approach reaches an area under curve (AUC) of 0.923 and an AUC of 0.914 for determining whether the patient suffers from CAD. This level of performance enables our approach to be used in a fully-automated screening setup or to assist diagnostic CCTA reading, especially due to its neural architecture design -- which allows comprehensive predictions.
Modeling Islamist Extremist Communications on Social Media using Contextual Dimensions: Religion, Ideology, and Hate
Kursuncu, Ugur, Gaur, Manas, Castillo, Carlos, Alambo, Amanuel, Thirunarayan, K., Shalin, Valerie, Achilov, Dilshod, Arpinar, I. Budak, Sheth, Amit
Terror attacks have been linked in part to online extremist content. Although tens of thousands of Islamist extremism supporters consume such content, they are a small fraction relative to peaceful Muslims. The efforts to contain the ever-evolving extremism on social media platforms have remained inadequate and mostly ineffective. Divergent extremist and mainstream contexts challenge machine interpretation, with a particular threat to the precision of classification algorithms. Our context-aware computational approach to the analysis of extremist content on Twitter breaks down this persuasion process into building blocks that acknowledge inherent ambiguity and sparsity that likely challenge both manual and automated classification. We model this process using a combination of three contextual dimensions -- religion, ideology, and hate -- each elucidating a degree of radicalization and highlighting independent features to render them computationally accessible. We utilize domain-specific knowledge resources for each of these contextual dimensions such as Qur'an for religion, the books of extremist ideologues and preachers for political ideology and a social media hate speech corpus for hate. Our study makes three contributions to reliable analysis: (i) Development of a computational approach rooted in the contextual dimensions of religion, ideology, and hate that reflects strategies employed by online Islamist extremist groups, (ii) An in-depth analysis of relevant tweet datasets with respect to these dimensions to exclude likely mislabeled users, and (iii) A framework for understanding online radicalization as a process to assist counter-programming. Given the potentially significant social impact, we evaluate the performance of our algorithms to minimize mislabeling, where our approach outperforms a competitive baseline by 10.2% in precision.
Quantifying Statistical Significance of Neural Network Representation-Driven Hypotheses by Selective Inference
Duy, Vo Nguyen Le, Iwazaki, Shogo, Takeuchi, Ichiro
In the past few years, various approaches have been developed to explain and interpret deep neural network (DNN) representations, but it has been pointed out that these representations are sometimes unstable and not reproducible. In this paper, we interpret these representations as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method to quantify the reliability of these hypotheses in statistical hypothesis testing framework. To this end, we introduce Selective Inference (SI) framework, which has received much attention in the past few years as a new statistical inference framework for data-driven hypotheses. The basic idea of SI is to make conditional inferences on the selected hypotheses under the condition that they are selected. In order to use SI framework for DNN representations, we develop a new SI algorithm based on homotopy method which enables us to derive the exact (non-asymptotic) conditional sampling distribution of the DNN-driven hypotheses. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method can successfully control the false positive rate, has decent performance in terms of computational efficiency, and provides good results in practical applications. The remarkable predictive performance of deep neural networks (DNNs) stems from their ability to learn appropriate representations from data.
Evolving test instances of the Hamiltonian completion problem
Lechien, Thibault, Jooken, Jorik, De Causmaecker, Patrick
Predicting and comparing algorithm performance on graph instances is challenging for multiple reasons. First, there is usually no standard set of instances to benchmark performance. Second, using existing graph generators results in a restricted spectrum of difficulty and the resulting graphs are usually not diverse enough to draw sound conclusions. That is why recent work proposes a new methodology to generate a diverse set of instances by using an evolutionary algorithm. We can then analyze the resulting graphs and get key insights into which attributes are most related to algorithm performance. We can also fill observed gaps in the instance space in order to generate graphs with previously unseen combinations of features. This methodology is applied to the instance space of the Hamiltonian completion problem using two different solvers, namely the Concorde TSP Solver and a multi-start local search algorithm.
Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks
Csordás, Róbert, van Steenkiste, Sjoerd, Schmidhuber, Jürgen
Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.