QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

Prakash, Shvetank, Cheng, Andrew, Yik, Jason, Tschand, Arya, Ghosal, Radhika, Uchendu, Ikechukwu, Quaye, Jessica, Ma, Jeffrey, Grampurohit, Shreyas, Giannuzzi, Sofia, Balyan, Arnav, Amin, Fin, Pipersenia, Aadya, Choudhary, Yash, Nayak, Ankita, Yazdanbakhsh, Amir, Reddi, Vijay Janapa

Jan-6-2025–arXiv.org Artificial Intelligence

We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jan-6-2025

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea (0.14)

Genre:
- Research Report (0.82)
- Instructional Material (0.68)

Industry:
- Education (0.69)
- Semiconductors & Electronics (0.47)
- Information Technology (0.46)

Technology:
- Information Technology
  - Architecture (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.71)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found