Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training

Aug-18-2025–arXiv.org Artificial Intelligence

We propose an end-to-end differentiable training paradigm for stable training of a rationalized transformer classifier. Our approach results in a single model that simultaneously classifies a sample and scores input tokens based on their relevance to the classification. To this end, we build on the widely-used three-player-game for training rationalized models, which typically relies on training a rationale selector, a classifier and a complement classifier. We simplify this approach by making a single model fulfill all three roles, leading to a more efficient training paradigm that is not susceptible to the common training instabilities that plague existing approaches. Further, we extend this paradigm to produce class-wise rationales while incorporating recent advances in parameterizing and regularizing the resulting rationales, thus leading to substantially improved and state-of-the-art alignment with human annotations without any explicit supervision.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-18-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine > Therapeutic Area (0.34)
- Media > Film (0.30)
- Leisure & Entertainment (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)