Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?

Open in new window