Testing GPT-4-o1-preview on math and science problems: A follow-up study

Oct-11-2024–arXiv.org Artificial Intelligence

In August 2023, Scott Aaronson and I reported the results of testing GPT4 with the Wolfram Alpha and Code Interpreter plug-ins over a collection of 105 original high-school level and college-level science and math problems (Davis and Aaronson, 2023). In September 2024, I tested the recently released model GPT-4o1-preview on the same collection. Overall I found that performance had significantly improved, but was still considerably short of perfect. In particular, problems that involve spatial reasoning are often stumbling blocks. On September 12, OpenAI (2024) released two preliminary versions, "ChatGPT-o1-preview" and "ChatGPT-o1-mini" of a forthcoming product "ChatGPT-o1".

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-11-2024

arXiv.org PDF

Add feedback

Country:
- South America > Venezuela
  - Capital District > Caracas (0.04)
- North America
  - Canada > Quebec (0.05)
  - United States
    - New York (0.04)
    - Michigan (0.04)
    - Texas > Potter County
      - Amarillo (0.04)
    - California > San Francisco County
      - San Francisco (0.05)
- Europe
  - France (0.05)
  - Spain (0.04)
  - Belgium (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
- Atlantic Ocean > North Atlantic Ocean
  - English Channel (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Africa > Middle East
  - Egypt > Cairo Governorate > Cairo (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Educational Setting (0.55)
- Government > Space Agency (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)