Evaluating Paraphrastic Robustness in Textual Entailment Models

Verma, Dhruv, Lal, Yash Kumar, Sinha, Shreyashee, Van Durme, Benjamin, Poliak, Adam

Jun-29-2023–arXiv.org Artificial Intelligence

We present PaRTE, a collection of 1,126 pairs of Recognizing Textual Entailment (RTE) examples to evaluate whether models are robust to paraphrasing. We posit that if RTE models understand language, their predictions should be consistent across inputs that share the same meaning. We use the evaluation set to determine if RTE models' predictions change when examples are paraphrased. In our experiments, contemporary models change their predictions on 8-16\% of paraphrased examples, indicating that there is still room for improvement.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-29-2023

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - Rio Grande do Norte > Natal (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.14)
    - New York > Suffolk County
      - Stony Brook (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found