Rangwala, Huzefa
Causal Knowledge Guided Societal Event Forecasting
Deng, Songgaojun, Rangwala, Huzefa, Ning, Yue
Data-driven societal event forecasting methods exploit relevant historical information to predict future events. These methods rely on historical labeled data and cannot accurately predict events when data are limited or of poor quality. Studying causal effects between events goes beyond correlation analysis and can contribute to a more robust prediction of events. However, incorporating causality analysis in data-driven event forecasting is challenging due to several factors: (i) Events occur in a complex and dynamic social environment. Many unobserved variables, i.e., hidden confounders, affect both potential causes and outcomes. (ii) Given spatiotemporal non-independent and identically distributed (non-IID) data, modeling hidden confounders for accurate causal effect estimation is not trivial. In this work, we introduce a deep learning framework that integrates causal effect estimation into event forecasting. We first study the problem of Individual Treatment Effect (ITE) estimation from observational event data with spatiotemporal attributes and present a novel causal inference model to estimate ITEs. We then incorporate the learned event-related causal information into event prediction as prior knowledge. Two robust learning modules, including a feature reweighting module and an approximate constraint loss, are introduced to enable prior knowledge injection. We evaluate the proposed causal inference model on real-world event datasets and validate the effectiveness of proposed robust learning modules in event prediction by feeding learned causal information into different deep learning methods. Experimental results demonstrate the strengths of the proposed causal inference model for ITE estimation in societal events and showcase the beneficial properties of robust learning modules in societal event forecasting.
Learning Smooth and Fair Representations
Gitiaux, Xavier, Rangwala, Huzefa
Organizations that own data face increasing legal liability for its discriminatory use against protected demographic groups, extending to contractual transactions involving third parties access and use of the data. This is problematic, since the original data owner cannot ex-ante anticipate all its future uses by downstream users. This paper explores the upstream ability to preemptively remove the correlations between features and sensitive attributes by mapping features to a fair representation space. Our main result shows that the fairness measured by the demographic parity of the representation distribution can be certified from a finite sample if and only if the chi-squared mutual information between features and representations is finite. Empirically, we find that smoothing the representation distribution provides generalization guarantees of fairness certificates, which improves upon existing fair representation learning approaches. Moreover, we do not observe that smoothing the representation distribution degrades the accuracy of downstream tasks compared to state-of-the-art methods in fair representation learning.
Diversity-Based Generalization for Neural Unsupervised Text Classification under Domain Shift
Krishnan, Jitin, Purohit, Hemant, Rangwala, Huzefa
Domain adaptation approaches seek to learn from a source domain and generalize it to an unseen target domain. At present, the state-of-the-art domain adaptation approaches for subjective text classification problems are semi-supervised; and use unlabeled target data along with labeled source data. In this paper, we propose a novel method for domain adaptation of single-task text classification problems based on a simple but effective idea of diversity-based generalization that does not require unlabeled target data. Diversity plays the role of promoting the model to better generalize and be indiscriminate towards domain shift by forcing the model not to rely on same features for prediction. We apply this concept on the most explainable component of neural networks, the attention layer. To generate sufficient diversity, we create a multi-head attention model and infuse a diversity constraint between the attention heads such that each head will learn differently. We further expand upon our model by tri-training and designing a procedure with an additional diversity constraint between the attention heads of the tri-trained classifiers. Extensive evaluation using the standard benchmark dataset of Amazon reviews and a newly constructed dataset of Crisis events shows that our fully unsupervised method matches with the competing semi-supervised baselines. Our results demonstrate that machine learning architectures that ensure sufficient diversity can generalize better; encouraging future research to design ubiquitously usable learning models without using unlabeled target data.
Sign Language Recognition Analysis using Multimodal Data
Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Kosecka, Jana, Rangwala, Huzefa
Voice-controlled personal and home assistants (such as the Amazon Echo and Apple Siri) are becoming increasingly popular for a variety of applications. However, the benefits of these technologies are not readily accessible to Deaf or Hard-ofHearing (DHH) users. The objective of this study is to develop and evaluate a sign recognition system using multiple modalities that can be used by DHH signers to interact with voice-controlled devices. With the advancement of depth sensors, skeletal data is used for applications like video analysis and activity recognition. Despite having similarity with the well-studied human activity recognition, the use of 3D skeleton data in sign language recognition is rare. This is because unlike activity recognition, sign language is mostly dependent on hand shape pattern. In this work, we investigate the feasibility of using skeletal and RGB video data for sign language recognition using a combination of different deep learning architectures. We validate our results on a large-scale American Sign Language (ASL) dataset of 12 users and 13107 samples across 51 signs. It is named as GMUASL51. We collected the dataset over 6 months and it will be publicly released in the hope of spurring further machine learning research towards providing improved accessibility for digital assistants.
Federated Multi-task Hierarchical Attention Model for Sensor Analytics
Chen, Yujing, Ning, Yue, Chai, Zheng, Rangwala, Huzefa
Sensors are an integral part of modern Internet of Things (IoT) applications. There is a critical need for the analysis of heterogeneous multivariate temporal data obtained from the individual sensors of these systems. In this paper we particularly focus on the problem of the scarce amount of training data available per sensor. We propose a novel federated multi-task hierarchical attention model (FATHOM) that jointly trains classification/regression models from multiple sensors. The attention mechanism of the proposed model seeks to extract feature representations from the input and learn a shared representation focused on time dimensions across multiple sensors. The underlying temporal and non-linear relationships are modeled using a combination of attention mechanism and long-short term memory (LSTM) networks. We find that our proposed method outperforms a wide range of competitive baselines in both classification and regression settings on activity recognition and environment monitoring datasets. We further provide visualization of feature representations learned by our model at the input sensor level and central time level.
Multi-Differential Fairness Auditor for Black Box Classifiers
Gitiaux, Xavier, Rangwala, Huzefa
Machine learning algorithms are increasingly involved in sensitive decision-making process with adversarial implications on individuals. This paper presents mdfa, an approach that identifies the characteristics of the victims of a classifier's discrimination. We measure discrimination as a violation of multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier's outcomes do not leak information on the sensitive attributes of a small group of individuals. We reduce the problem of identifying worst-case violations to matching distributions and predicting where sensitive attributes and classifier's outcomes coincide. We apply mdfa to a recidivism risk assessment classifier and demonstrate that individuals identified as African-American with little criminal history are three-times more likely to be considered at high risk of violent recidivism than similar individuals but not African-American.
Reliable Deep Grade Prediction with Uncertainty Estimation
Hu, Qian, Rangwala, Huzefa
Currently, college-going students are taking longer to graduate than their parental generations. Further, in the United States, the six-year graduation rate has been 59% for decades. Improving the educational quality by training better-prepared students who can successfully graduate in a timely manner is critical. Accurately predicting students' grades in future courses has attracted much attention as it can help identify at-risk students early so that personalized feedback can be provided to them on time by advisors. Prior research on students' grade prediction include shallow linear models; however, students' learning is a highly complex process that involves the accumulation of knowledge across a sequence of courses that can not be sufficiently modeled by these linear models. In addition to that, prior approaches focus on prediction accuracy without considering prediction uncertainty, which is essential for advising and decision making. In this work, we present two types of Bayesian deep learning models for grade prediction. The MLP ignores the temporal dynamics of students' knowledge evolution. Hence, we propose RNN for students' performance prediction. To evaluate the performance of the proposed models, we performed extensive experiments on data collected from a large public university. The experimental results show that the proposed models achieve better performance than prior state-of-the-art approaches. Besides more accurate results, Bayesian deep learning models estimate uncertainty associated with the predictions. We explore how uncertainty estimation can be applied towards developing a reliable educational early warning system. In addition to uncertainty, we also develop an approach to explain the prediction results, which is useful for advisors to provide personalized feedback to students.
Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning
Naik, Azad, Charuvaka, Anveshi, Rangwala, Huzefa
Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task.
Inconsistent Node Flattening for Improving Top-down Hierarchical Classification
Naik, Azad, Rangwala, Huzefa
Large-scale classification of data where classes are structurally organized in a hierarchy is an important area of research. Top-down approaches that exploit the hierarchy during the learning and prediction phase are efficient for large scale hierarchical classification. However, accuracy of top-down approaches is poor due to error propagation i.e., prediction errors made at higher levels in the hierarchy cannot be corrected at lower levels. One of the main reason behind errors at the higher levels is the presence of inconsistent nodes that are introduced due to the arbitrary process of creating these hierarchies by domain experts. In this paper, we propose two different data-driven approaches (local and global) for hierarchical structure modification that identifies and flattens inconsistent nodes present within the hierarchy. Our extensive empirical evaluation of the proposed approaches on several image and text datasets with varying distribution of features, classes and training instances per class shows improved classification performance over competing hierarchical modification approaches. Specifically, we see an improvement upto 7% in Macro-F1 score with our approach over best TD baseline. SOURCE CODE: http://www.cs.gmu.edu/~mlbio/InconsistentNodeFlattening
Embedding Feature Selection for Large-scale Hierarchical Classification
Naik, Azad, Rangwala, Huzefa
Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.