Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering

Frahm, Noah, Patel, Prakrut, Zhang, Yue, Yu, Shoubin, Bansal, Mohit, Sengupta, Roni

Nov-26-2025–arXiv.org Artificial Intelligence

Large vision-language models (VLMs) have improved embodied question answering (EQA) agents by providing strong semantic priors for open-vocabulary reasoning. However, when used directly for step-level exploration, VLMs often exhibit frontier oscillations, unstable back-and-forth movements caused by overconfidence and miscalibra-tion, leading to inefficient navigation and degraded answer quality. W e propose Prune-Then-Plan, a simple and effective framework that stabilizes exploration through step-level calibration. Instead of trusting raw VLM scores, our method prunes implausible frontier choices using a Holm-Bonferroni inspired pruning procedure and then delegates final decisions to a coverage-based planner . This separation converts overconfident predictions into conservative, interpretable actions by relying on human-level judgments to calibrate the step-level behavior of VLMs. Integrated into the 3D-Mem EQA framework, our approach achieves relative improvements of up to 49% and 33% in visually grounded SPL and LLM-Match metrics respectively over baselines. Overall, our method achieves better scene coverage under equal exploration budgets on both OpenEQA and EXPRESS-Bench datasets. W e provide additional visuals of results on our Project Page.

large language model, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

Nov-26-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > North Carolina (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Large Language Model (0.52)
  - Question Answering (0.62)