Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge
–arXiv.org Artificial Intelligence
We carry out a series of experiments to test large language models' multi-hop reasoning ability from three aspects: selecting and combining external knowledge, dealing with non-sequential reasoning tasks and generalising to data samples with larger numbers of hops. We test the GPT-3.5 model on four reasoning benchmarks with Chain-of-Thought prompting (and its variations). Our results reveal that despite the amazing performance achieved by large language models on various reasoning tasks, models still suffer from severe drawbacks which shows a large gap with humans.
arXiv.org Artificial Intelligence
Dec-11-2024
- Country:
- Asia
- China (0.14)
- East Asia (0.04)
- Middle East > Jordan (0.04)
- Russia > Siberian Federal District
- Novosibirsk Oblast > Novosibirsk (0.04)
- South Korea (0.05)
- Europe
- North America > United States
- New Jersey > Hudson County
- Jersey City (0.05)
- California
- Los Angeles County (0.14)
- San Diego County > San Diego (0.04)
- Oklahoma (0.04)
- Kentucky > Boyle County (0.04)
- Maryland (0.04)
- Pennsylvania > Blair County (0.04)
- South Carolina > Union County (0.04)
- New York (0.04)
- Indiana
- Porter County (0.04)
- Vigo County > Terre Haute (0.04)
- Minnesota
- Hennepin County > Minneapolis (0.04)
- Le Sueur County (0.04)
- Arkansas (0.04)
- New Jersey > Hudson County
- Asia
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Energy (1.00)
- Government
- Health & Medicine > Therapeutic Area (1.00)
- Leisure & Entertainment > Sports
- Basketball (1.00)
- Media
- Technology: