BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Ansari, Baktash, Rostamkhani, Mohammadmostafa, Eetemadi, Sauleh

Jun-7-2024–arXiv.org Artificial Intelligence

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think "outside of the box". We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a "round table conference" approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

brainteaser, language model, reasoning, (13 more...)

arXiv.org Artificial Intelligence

Jun-7-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Iran (0.04)
  - Singapore (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States
    - Alaska (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found