Goto

Collaborating Authors

 Bourgogne-Franche-Comté



SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

Neural Information Processing Systems

Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities.





InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Xiaoyi Dong

Neural Information Processing Systems

The Large Vision-Language Model (L VLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.



and

Neural Information Processing Systems

Successfully employing cutting planes can be challenging because there are infinitely many cuts to choose from and there are still many open questions about which cuts to employ when.