Singh, Neeraj Kumar
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods
Seth, Pratinav, Rathore, Yashwardhan, Singh, Neeraj Kumar, Chitroda, Chintan, Sankarapu, Vinay Kumar
The increasing complexity of machine learning (ML) and deep learning (DL) models has led to their widespread adoption in numerous real-world applications. However, as these models become more powerful, they also become less interpretable. In particular, deep neural networks (DNNs), which have achieved state-of-the-art performance in tasks such as image recognition, natural language processing, and autonomous driving, are often viewed as "black box" models due to their complexity and lack of transparency. Interpretability is essential, particularly in high-stakes fields where the consequences of incorrect or non-explainable decisions can be profound. In domains such as healthcare, finance, and law, it is not only crucial that AI systems make accurate predictions but also that these predictions can be understood and justified by human stakeholders. For example, in healthcare, understanding why a model predicts a certain diagnosis can be as important as the prediction itself, influencing clinical decisions and patient outcomes.
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models
Sankarapu, Vinay Kumar, Chitroda, Chintan, Rathore, Yashwardhan, Singh, Neeraj Kumar, Seth, Pratinav
The rapid advancement of artificial intelligence has led to increasingly sophisticated deep learning models, which frequently operate as opaque 'black boxes' with limited transparency in their decision-making processes. This lack of interpretability presents considerable challenges, especially in high-stakes applications where understanding the rationale behind a model's outputs is as essential as the outputs themselves. This study addresses the pressing need for interpretability in AI systems, emphasizing its role in fostering trust, ensuring accountability, and promoting responsible deployment in mission-critical fields. To address the interpretability challenge in deep learning, we introduce DLBacktrace, an innovative technique developed by the AryaXAI team to illuminate model decisions across a wide array of domains, including simple Multi Layer Perceptron (MLPs), Convolutional Neural Networks (CNNs), Large Language Models (LLMs), Computer Vision Models, and more. We provide a comprehensive overview of the DLBacktrace algorithm and present benchmarking results, comparing its performance against established interpretability methods, such as SHAP, LIME, GradCAM, Integrated Gradients, SmoothGrad, and Attention Rollout, using diverse task-based metrics. The proposed DLBacktrace technique is compatible with various model architectures built in PyTorch and TensorFlow, supporting models like Llama 3.2, other NLP architectures such as BERT and LSTMs, computer vision models like ResNet and U-Net, as well as custom deep neural network (DNN) models for tabular data. This flexibility underscores DLBacktrace's adaptability and effectiveness in enhancing model transparency across a broad spectrum of applications. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .
HCDIR: End-to-end Hate Context Detection, and Intensity Reduction model for online comments
Singh, Neeraj Kumar, Ghosh, Koyel, Mahapatra, Joy, Garain, Utpal, Senapati, Apurbalal
Warning: This paper contains examples of the language that some people may find offensive. Detecting and reducing hateful, abusive, offensive comments is a critical and challenging task on social media. Moreover, few studies aim to mitigate the intensity of hate speech. While studies have shown that context-level semantics are crucial for detecting hateful comments, most of this research focuses on English due to the ample datasets available. In contrast, low-resource languages, like Indian languages, remain under-researched because of limited datasets. Contrary to hate speech detection, hate intensity reduction remains unexplored in high-resource and low-resource languages. In this paper, we propose a novel end-to-end model, HCDIR, for Hate Context Detection, and Hate Intensity Reduction in social media posts. First, we fine-tuned several pre-trained language models to detect hateful comments to ascertain the best-performing hateful comments detection model. Then, we identified the contextual hateful words. Identification of such hateful words is justified through the state-of-the-art explainable learning model, i.e., Integrated Gradient (IG). Lastly, the Masked Language Modeling (MLM) model has been employed to capture domain-specific nuances to reduce hate intensity. We masked the 50\% hateful words of the comments identified as hateful and predicted the alternative words for these masked terms to generate convincing sentences. An optimal replacement for the original hate comments from the feasible sentences is preferred. Extensive experiments have been conducted on several recent datasets using automatic metric-based evaluation (BERTScore) and thorough human evaluation. To enhance the faithfulness in human evaluation, we arranged a group of three human annotators with varied expertise.