Testing LLM performance on the Physics GRE: some observations

Dec-7-2023–arXiv.org Artificial Intelligence

With the recent developments in large language models (LLMs) and their widespread availability through open source models and/or low-cost APIs, several exciting products and applications are emerging, many of which are in the field of STEM educational technology for K-12 and university students. There is a need to evaluate these powerful language models on several benchmarks, in order to understand their risks and limitations. In this short paper, we summarize and analyze the performance of Bard, a popular LLM-based conversational service made available by Google, on the standardized Physics GRE examination.

bard, language model, llm, (16 more...)

arXiv.org Artificial Intelligence

Dec-7-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > Belgium
  - Brussels-Capital Region > Brussels (0.04)

Genre:
- Research Report (0.40)
- Instructional Material (0.34)

Industry:
- Education
  - Curriculum > Subject-Specific Education (0.49)
  - Educational Setting > Higher Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.98)