Goto

Collaborating Authors

 Gupta, Soumyajit


Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

arXiv.org Artificial Intelligence

In developing natural language processing (NLP) models to detect toxic language (Arango et al., 2019; Schmidt and Wiegand, 2017; Vaidya et al., 2020), we typically assume that toxic language manifests in similar forms across different targeted groups. For example, HateCheck (Rรถttger et al., 2021) enumerates templatic patterns such as "I hate [GROUP]" that we expect detection models to handle robustly across groups. Moreover, we typically pool data across different demographic targets in model training in order to learn general patterns of linguistic toxicity across diverse demographic targets. However, the nature and form of toxic language used to target different demographic groups can vary quite markedly. Furthermore, an imbalanced distribution of different demographic groups in toxic language datasets risks over-fitting forms of toxic language most relevant to the majority group(s), potentially at the expense of systematically weaker model performance on minority group(s). For this reason, a "one-size-fits-all" modeling approach may yield sub-optimal performance and more specifically raise concerns of algorithmic fairness (Arango et al., 2019; Park et al., 2018; Sap et al., 2019). At the same time, radically siloing off datasets for each different demographic target group would prevent models from learning broader linguistic patterns of toxicity across different demographic groups targeted.


Tail-Net: Extracting Lowest Singular Triplets for Big Data Applications

arXiv.org Artificial Intelligence

SVD serves as an exploratory tool in identifying the dominant features in the form of top rank-r singular factors corresponding to the largest singular values. For Big Data applications it is well known that Singular Value Decomposition (SVD) is restrictive due to main memory requirements. However, a number of applications such as community detection, clustering, or bottleneck identification in large scale graph data-sets rely upon identifying the lowest singular values and the singular corresponding vectors. For example, the lowest singular values of a graph Laplacian reveal the number of isolated clusters (zero singular values) or bottlenecks (lowest non-zero singular values) for undirected, acyclic graphs. A naive approach here would be to perform a full SVD however, this quickly becomes infeasible for practical big data applications due to the enormous memory requirements. Furthermore, for such applications only a few lowest singular factors are desired making a full decomposition computationally exorbitant. In this work, we trivially extend the previously proposed Range-Net to \textbf{Tail-Net} for a memory and compute efficient extraction of lowest singular factors of a given big dataset and a specified rank-r. We present a number of numerical experiments on both synthetic and practical data-sets for verification and bench-marking using conventional SVD as the baseline.


Streaming Singular Value Decomposition for Big Data Applications

arXiv.org Artificial Intelligence

Singular Value Decomposition (SVD) plays a pivotal role in exploratory data analysis. However, in a Big Data setting computing the dominant singular vectors is often restrictive due to the main memory requirements imposed by the dataset. Recently introduced randomized projection schemes attempt to mitigate this memory load by constructing approximate projections of the true dataset in a streaming setting. However, these projection methods come at the cost of approximation errors in both top singular values and vectors. Furthermore, in order to bound the approximation error, an over-sampled projection is required, often much larger in dimension than the desired rank. This latter consideration can still be memory intensive when the data dimension is large or extraneous when the desired rank approximation is close to the full rank. We present a two stage neural optimization approach as an alternative to conventional and randomized SVD techniques, where the memory requirement depends explicitly on the feature dimension and desired rank, independent of the sample size. The proposed scheme reads data samples in a streaming setting with the network minimization problem converging to a low rank approximation with high precision. Our architecture is fully interpretable where all the network outputs and weights have a specific meaning. We evaluate our results on various performance metrics against state of the art streaming methods. We also present numerical experiments for Singular and Eigen value decomposition on real data at various scales to show the memory efficiency of our proposed approach.


Prevention is Better than Cure: Handling Basis Collapse and Transparency in Dense Networks

arXiv.org Machine Learning

Dense nets are an integral part of any classification and regression problem. Recently, these networks have found a new application as solvers for known representations in various domains. However, one crucial issue with dense nets is it's feature interpretation and lack of reproducibility over multiple training runs. In this work, we identify a basis collapse issue as a primary cause and propose a modified loss function that circumvents this problem. We also provide a few general guidelines relating the choice of activations to loss surface roughness and appropriate scaling for designing low-weight dense nets. We demonstrate through carefully chosen numerical experiments that the basis collapse issue leads to the design of massively redundant networks. Our approach results in substantially concise nets, having $100 \times$ fewer parameters, while achieving a much lower $(10\times)$ MSE loss at scale than reported in prior works. Further, we show that the width of a dense net is acutely dependent on the feature complexity. This is in contrast to the dimension dependent width choice reported in prior theoretical works. To the best of our knowledge, this is the first time these issues and contradictions have been reported and experimentally verified. With our design guidelines we render transparency in terms of a low-weight network design. We share our codes for full reproducibility available at https://github.com/smjtgupta/Dense_Net_Regress.


TIME: A Transparent, Interpretable, Model-Adaptive and Explainable Neural Network for Dynamic Physical Processes

arXiv.org Machine Learning

Partial Differential Equations are infinite dimensional encoded representations of physical processes. However, imbibing multiple observation data towards a coupled representation presents significant challenges. We present a fully convolutional architecture that captures the invariant structure of the domain to reconstruct the observable system. The proposed architecture is significantly low-weight compared to other networks for such problems. Our intent is to learn coupled dynamic processes interpreted as deviations from true kernels representing isolated processes for model-adaptivity. Experimental analysis shows that our architecture is robust and transparent in capturing process kernels and system anomalies. We also show that high weights representation is not only redundant but also impacts network interpretability. Our design is guided by domain knowledge, with isolated process representations serving as ground truths for verification. These allow us to identify redundant kernels and their manifestations in activation maps to guide better designs that are both interpretable and explainable unlike traditional deep-nets.