Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses

Su, Hung-Ting, Hsu, Ya-Ching, Lin, Xudong, Shi, Xiang-Qian, Niu, Yulei, Hsu, Han-Yuan, Lee, Hung-yi, Hsu, Winston H.

Sep-22-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the narrative reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, Figure 1: While LLMs have revolutionized NLP reasoning, while prior studies suggest that CoT enhances surpassing previous supervised learning (SL) multi-step reasoning, this study shows methods and even reaching human-level performance CoT can cause hallucinations in narrative content, on some tasks, their limitations become apparent when reducing GPT-4's performance. We also tested against the Trope dataset. NLU: Natural Language introduce an Adversarial Injection method to Understanding, CS: Commonsense. Check Section embed trope-related text tokens into movie synopses 1 and 2.2 for details.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Sep-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > New York County > New York City (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Law Enforcement & Public Safety (0.46)
- Leisure & Entertainment (0.46)
- Media (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.77)
  - Natural Language > Large Language Model (1.00)