Changing Answer Order Can Decrease MMLU Accuracy

Gupta, Vipul, Pantoja, David, Ross, Candace, Williams, Adina, Ung, Megan

Jun-27-2024–arXiv.org Artificial Intelligence

For can affect multiple choice tests, for example, example, NLP model accuracy has been shown to when answers are presented in a different order be fairly brittle. For example, accuracy can drop during retest (Krosnick and Fabrigar, 1991; when researchers apply input alterations based Tellinghuisen and Sulikowski, 2008; Lions et al., on paraphrasing (Gan and Ng, 2019), word order 2022). However, as models do not have the biological changes (Gauthier and Levy, 2019; Ribeiro et al., limitations of humans, we may expect them 2020; Sinha et al., 2021a, 2022; Allen-Zhu and Li, to exhibit less variation than humans, or possibly 2023a,b; Berglund et al., 2023; Golovneva et al., even none at all. Thus, we claim that a model 2024; Kitouni et al., 2024), or other minor, largely should be robust to answer order changes: if it gets meaning-preserving input variations or perturbations the correct answer to a question when the answer (Belinkov and Bisk, 2018; Ebrahimi et al., is labeled'A', it should also always get the correct 2018; Jiang et al., 2020; Gao et al., 2021; Li et al., answer when it is labeled'C'. Put another way, 2021; Sinha et al., 2021b; Moradi and Samwald, the model should select the same answer for each 2021; Papakipos and Bitton, 2022; Qian et al., question, regardless of its label, for every possible 2022; Goodarzi et al., 2023; Sinha et al., 2023).

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-27-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- North America
  - Canada (0.28)
  - United States > California (0.14)

Genre:
- Research Report (0.82)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.38)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found