Testing LLM performance on the Physics GRE: some observations
–arXiv.org Artificial Intelligence
With the recent developments in large language models (LLMs) and their widespread availability through open source models and/or low-cost APIs, several exciting products and applications are emerging, many of which are in the field of STEM educational technology for K-12 and university students. There is a need to evaluate these powerful language models on several benchmarks, in order to understand their risks and limitations. In this short paper, we summarize and analyze the performance of Bard, a popular LLM-based conversational service made available by Google, on the standardized Physics GRE examination.
arXiv.org Artificial Intelligence
Dec-7-2023
- Country:
- North America > United States
- New York > New York County > New York City (0.04)
- Europe > Belgium
- Brussels-Capital Region > Brussels (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Instructional Material (0.34)
- Industry:
- Education
- Curriculum > Subject-Specific Education (0.49)
- Educational Setting > Higher Education (0.34)
- Education
- Technology: