Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge

Dec-11-2024–arXiv.org Artificial Intelligence

We carry out a series of experiments to test large language models' multi-hop reasoning ability from three aspects: selecting and combining external knowledge, dealing with non-sequential reasoning tasks and generalising to data samples with larger numbers of hops. We test the GPT-3.5 model on four reasoning benchmarks with Chain-of-Thought prompting (and its variations). Our results reveal that despite the amazing performance achieved by large language models on various reasoning tasks, models still suffer from severe drawbacks which shows a large gap with humans.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Oklahoma (0.04)
  - Maryland (0.04)
  - Arkansas (0.04)
  - New York (0.04)
  - South Carolina > Union County (0.04)
  - Pennsylvania > Blair County (0.04)
  - Kentucky > Boyle County (0.04)
  - Minnesota
    - Le Sueur County (0.04)
    - Hennepin County > Minneapolis (0.04)
  - Indiana
    - Vigo County > Terre Haute (0.04)
    - Porter County (0.04)
  - California
    - Los Angeles County (0.14)
    - San Diego County > San Diego (0.04)
  - New Jersey > Hudson County
    - Jersey City (0.05)
- Europe
  - Germany (0.04)
  - Russia (0.04)
- Asia
  - China (0.14)
  - South Korea (0.05)
  - Middle East > Jordan (0.04)
  - East Asia (0.04)
  - Russia > Siberian Federal District
    - Novosibirsk Oblast > Novosibirsk (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Energy (1.00)
- Media
  - Film (1.00)
  - Music (0.68)
- Leisure & Entertainment > Sports
  - Basketball (1.00)
- Government
  - Military (1.00)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found