Performance Analysis
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
Tran, Thanh, Hu, Yifan, Hu, Changwei, Yen, Kevin, Tan, Fei, Lee, Kyumin, Park, Serim
We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.
Naรฏve Bayes Classifier: A pure statistical approach to ML
Naรฏve Bayes Classifier: A pure statistical approach to ML. Learn how Statistics helps in developing Machine Learning models. This class has the purpose to make you understand the theory behind the popular Naรฏve Bayes Classifier method used in Machine Learning and to teach you how to implement it in code, using Python. Therefore, the course is divided into 2 parts: a theoretical one and a practical one. We are also going to implement other popular Machine Learning algorithms and compare the performances with our proposed Naรฏve Bayes technique. What am I going to get from this course? Learn how to implement other popular Machine Learning models in code and how to compare the performances with a concrete example.
Goodness-of-Fit Test of Mismatched Models for Self-Exciting Processes
Wei, Song, Zhu, Shixiang, Zhang, Minghe, Xie, Yao
We develop a goodness-of-fit (GOF) test for generative models of self-exciting processes by making a new connection to this problem with the classical statistical theory of Quasi-maximum-likelihood estimator (QMLE). We present a non-parametric self-normalizing statistic for the GOF test: the Generalized Score (GS) statistics, and explicitly capture the model misspecification when establishing the asymptotic distribution of the GS statistic. Numerical experiments based on simulation and real-data validate our theory and demonstrate the proposed GS test's good performance.
DeepIntent: ImplicitIntent based Android IDS with E2E Deep Learning architecture
Sewak, Mohit, Sahay, Sanjay K., Rathore, Hemant
The Intent in Android plays an important role in inter-process and intra-process communications. The implicit Intent that an application could accept are declared in its manifest and are amongst the easiest feature to extract from an apk. Implicit Intents could even be extracted online and in real-time. So far neither the feasibility of developing an Intrusion Detection System solely on implicit Intent has been explored, nor are any benchmarks available of a malware classifier that is based on implicit Intent alone. We demonstrate that despite Intent is implicit and well declared, it can provide very intuitive insights to distinguish malicious from non-malicious applications. We conducted exhaustive experiments with over 40 different end-to-end Deep Learning configurations of Auto-Encoders and Multi-Layer-Perceptron to create a benchmark for a malware classifier that works exclusively on implicit Intent. Using the results from the experiments we create an intrusion detection system using only the implicit Intents and end-to-end Deep Learning architecture. We obtained an area-under-curve statistic of 0.81, and accuracy of 77.2% along with false-positive-rate of 0.11 on Drebin dataset.
Emergent and Unspecified Behaviors in Streaming Decision Trees
Manapragada, Chaitanya, Webb, Geoffrey I, Salehi, Mahsa, Bifet, Albert
Hoeffding trees are the state-of-the-art methods in decision tree learning for evolving data streams. These very fast decision trees are used in many real applications where data is created in real-time due to their efficiency. In this work, we extricate explanations for why these streaming decision tree algorithms for stationary and nonstationary streams (HoeffdingTree and HoeffdingAdaptiveTree) work as well as they do. In doing so, we identify thirteen unique unspecified design decisions in both the theoretical constructs and their implementations with substantial and consequential effects on predictive accuracy---design decisions that, without necessarily changing the essence of the algorithms, drive algorithm performance. We begin a larger conversation about explainability not just of the model but also of the processes responsible for an algorithm's success.
ASMFS: Adaptive-Similarity-based Multi-modality Feature Selection for Classification of Alzheimer's Disease
Shi, Yuang, Zu, Chen, Hong, Mei, Zhou, Luping, Wang, Lei, Wu, Xi, Zhou, Jiliu, Zhang, Daoqiang, Wang, Yan
Multimodal classification methods using different modalities of imaging and non-imaging data have great advantages over traditional single-modality-based ones for the diagnosis and prognosis of Alzheimer's disease (AD), as well as mild cognitive impairment (MCI) which is the prodromal stage of AD. With the increasing amount of high-dimensional heterogeneous data to be processed, multi-modality feature selection has become a crucial research direction in medical image analysis. However, traditional methods usually depict the data structure using fixed and predefined similarity matrix as a priori, which is difficult to precisely measure the intrinsic relationship structure across different modalities in highdimensional spaces. In addition, based on the predefined similarity matrix, the chosen neighbors are suboptimal thus limiting the performance of the subsequent classification task. To overcome these drawbacks, in this paper, we propose a novel multi-modal feature selection method called Adaptive-Similarity-based Multi-modality Feature Selection (ASMFS) which performs adaptive similarity learning and feature selection simultaneously.
A Graph Neural Network based approach for detecting Suspicious Users on Online Social Media
Sharma, Shakshi, Sharma, Rajesh
Online Social Media platforms (such as Twitter and Facebook) are extensively used for spreading the news to a wider public effortlessly at a rapid pace. However, now a days these platforms are also used with an aim of spreading rumors and fake news to a large audience in a short time span that can cause panic, fear, and financial loss to society. Thus, it is important to detect and control these rumors before it spreads to the masses. One way to control the spread of these rumors is by identifying possible suspicious users who are often involved in spreading the rumors. Our basic assumption is that the users who are often involved in spreading rumors are more likely to be suspicious in contrast to the users whose involvement in spreading rumors are less. This is due to the fact that sometimes, users may posts the rumor tweets by accident. In this paper, we use PHEME rumor tweet dataset which contains rumor and non-rumor tweets information on five incidents, that is, i) Charlie hebdo, ii)German wings crash, iii)Ottawa shooting, iv)Sydney siege, and v)Ferguson. We transform this rumor tweets dataset into suspicious users dataset before leveraging Graph Neural Network (GNN) based approach for identifying suspicious users. Specifically, we explore Graph Convolutional Network (GCN),which is a type of GNN, for identifying suspicious users and then we compare GCN results with the other three approaches which act as baseline approaches: SVM, RF and LSTM based deep learning architecture. Extensive experiments performed on real-world dataset, where we achieve up to 0.864 value for F1-Score and 0.720 value for AUC ROC, shows the effectiveness of GNN based approach for identifying suspicious users.
Equitable Allocation of Healthcare Resources with Fair Cox Models
Keya, Kamrun Naher, Islam, Rashidul, Pan, Shimei, Stockwell, Ian, Foulds, James R.
Healthcare programs such as Medicaid provide crucial services to vulnerable populations, but due to limited resources, many of the individuals who need these services the most languish on waiting lists. Survival models, e.g. the Cox proportional hazards model, can potentially improve this situation by predicting individuals' levels of need, which can then be used to prioritize the waiting lists. Providing care to those in need can prevent institutionalization for those individuals, which both improves quality of life and reduces overall costs. While the benefits of such an approach are clear, care must be taken to ensure that the prioritization process is fair or independent of demographic information-based harmful stereotypes. In this work, we develop multiple fairness definitions for survival models and corresponding fair Cox proportional hazards models to ensure equitable allocation of healthcare resources. We demonstrate the utility of our methods in terms of fairness and predictive accuracy on two publicly available survival datasets.
Differentiable Causal Discovery Under Unmeasured Confounding
Bhattacharya, Rohit, Nagarajan, Tushar, Malinsky, Daniel, Shpitser, Ilya
The data drawn from biological, economic, and social systems are often confounded due to the presence of unmeasured variables. Prior work in causal discovery has focused on discrete search procedures for selecting acyclic directed mixed graphs (ADMGs), specifically ancestral ADMGs, that encode ordinary conditional independence constraints among the observed variables of the system. However, confounded systems also exhibit more general equality restrictions that cannot be represented via these graphs, placing a limit on the kinds of structures that can be learned using ancestral ADMGs. In this work, we derive differentiable algebraic constraints that fully characterize the space of ancestral ADMGs, as well as more general classes of ADMGs, arid ADMGs and bow-free ADMGs, that capture all equality restrictions on the observed variables. We use these constraints to cast causal discovery as a continuous optimization problem and design differentiable procedures to find the best fitting ADMG when the data comes from a confounded linear system of equations with correlated errors. We demonstrate the efficacy of our method through simulations and application to a protein expression dataset.
The Effect of Class Imbalance on Precision-Recall Curves
In this note I study how the precision of a classifier depends on the ratio $r$ of positive to negative cases in the test set, as well as the classifier's true and false positive rates. This relationship allows prediction of how the precision-recall curve will change with $r$, which seems not to be well known. It also allows prediction of how $F_{\beta}$ and the Precision Gain and Recall Gain measures of Flach and Kull (2015) vary with $r$.