Interpreting the Prediction of BERT Model for Text Classification
Integrated gradients is a method to compute the attribution of each feature of a deep learning model based on the gradient of the model's output (prediction) with respect to the input. This method applies to any deep learning model for classification and regression tasks. As an example, let's say that we have a text classification model and we want to interpret its prediction. With integrated gradients, in the end, we will get the attribution score of each input word with respect to the final prediction. We can use this attribution score to find out which words play an important role in our model's final prediction.
Dec-20-2022, 18:45:12 GMT
- Technology: