Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
Davis, Ernest, Aaronson, Scott
–arXiv.org Artificial Intelligence
Our test sets were too small and too haphazard to support statistically valid conclusions, but they were suggestive of a number of conclusions. We summarize these here, and discuss them at greater length in section 7. Over the kinds of problems tested, GPT-4 with either plug-in is significantly stronger than GPT-4 by itself, or, almost certainly, than any AI that existed a year ago. However it is still far from reliable; it often outputs a wrong answer or fails to output any answer. In terms of overall score, we would judge that these systems performs on the level of a middling undergraduate student. However, their capacities and weaknesses do not align with a human student; the systems solve some problems that even capable students would find challenging, whereas they fail on some problems that even middling high school students would find easy.
arXiv.org Artificial Intelligence
Aug-14-2023
- Country:
- Atlantic Ocean (0.04)
- South America
- Venezuela > Capital District
- Caracas (0.04)
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Venezuela > Capital District
- Oceania > New Zealand
- North Island > Waikato > Hamilton (0.04)
- North America
- United States
- Michigan (0.04)
- Maine (0.04)
- Virginia > Virginia Beach (0.04)
- District of Columbia > Washington (0.04)
- Alaska > Sitka City and Borough
- Sitka (0.04)
- South Carolina > Richland County
- Columbia (0.04)
- South Dakota > Hughes County
- Pierre (0.04)
- Indiana > Marion County
- Indianapolis (0.04)
- Florida
- Duval County > Jacksonville (0.04)
- Monroe County > Key West (0.04)
- Colorado > Denver County
- Denver (0.04)
- Texas
- Potter County > Amarillo (0.04)
- Travis County > Austin (0.04)
- Harris County > Houston (0.04)
- El Paso County > El Paso (0.04)
- New York
- New York County > New York City (0.04)
- Albany County > Albany (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Kansas > Sedgwick County
- Wichita (0.04)
- Nebraska > Douglas County
- Omaha (0.04)
- Alabama > Montgomery County
- Montgomery (0.04)
- Illinois
- Cook County > Chicago (0.04)
- Sangamon County > Springfield (0.04)
- Morgan County > Jacksonville (0.04)
- Washington
- Spokane County > Spokane (0.04)
- King County > Seattle (0.04)
- California
- Los Angeles County > Long Beach (0.04)
- Fresno County > Fresno (0.04)
- Alameda County
- Mexico > Mexico City
- Mexico City (0.04)
- Jamaica > Kingston
- Kingston (0.04)
- Canada
- Quebec (0.04)
- Ontario > Toronto (0.04)
- Alberta > Census Division No. 11
- Edmonton Metropolitan Region > Edmonton (0.04)
- United States
- Europe > Russia
- Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia
- Vietnam > Nghệ An Province (0.04)
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- Genre:
- Research Report (0.41)
- Industry:
- Education > Educational Setting > K-12 Education > Secondary School (0.54)
- Technology: