Inseq: An Interpretability Toolkit for Sequence Generation Models

Sarti, Gabriele, Feldhus, Nils, Sickert, Ludwig, van der Wal, Oskar, Nissim, Malvina, Bisazza, Arianna

May-27-2023–arXiv.org Artificial Intelligence

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python library to democratize access to interpretability analyses of sequence generation models. Inseq enables intuitive and optimized extraction of models' internal information and feature importance scores for popular decoder-only and encoder-decoder Transformers architectures. We showcase its potential by adopting it to highlight gender biases in machine translation models and locate factual knowledge inside GPT-2. Thanks to its extensible interface supporting cutting-edge techniques such as contrastive feature attribution, Inseq can drive future advances in explainable natural language generation, centralizing good practices and enabling fair and reproducible model evaluations.

inseq, interpretability toolkit, sequence generation model

arXiv.org Artificial Intelligence

May-27-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (0.53)
    - Generation (0.53)
  - Machine Learning > Neural Networks
    - Deep Learning (0.53)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found