Shi, Xiang-Qian
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Su, Hung-Ting, Hsu, Ya-Ching, Lin, Xudong, Shi, Xiang-Qian, Niu, Yulei, Hsu, Han-Yuan, Lee, Hung-yi, Hsu, Winston H.
Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the narrative reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, Figure 1: While LLMs have revolutionized NLP reasoning, while prior studies suggest that CoT enhances surpassing previous supervised learning (SL) multi-step reasoning, this study shows methods and even reaching human-level performance CoT can cause hallucinations in narrative content, on some tasks, their limitations become apparent when reducing GPT-4's performance. We also tested against the Trope dataset. NLU: Natural Language introduce an Adversarial Injection method to Understanding, CS: Commonsense. Check Section embed trope-related text tokens into movie synopses 1 and 2.2 for details.
Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping
Chung, Chi-Ming, Tseng, Yang-Che, Hsu, Ya-Ching, Shi, Xiang-Qian, Hua, Yun-Hung, Yeh, Jia-Fong, Chen, Wen-Chin, Chen, Yi-Ting, Hsu, Winston H.
A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes. Code link: https://github.com/MarvinChung/Orbeez-SLAM.