AITopics | spatial relation

However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

844f722dbbcb27933ff5baf58a1f00c8-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-10-2026, 09:57:36 GMT

dataset, expression, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

819aaee144cb40e887a4aa9e781b1547-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 07:30:49 GMT

dataset, scanrefer dataset, vil3drel model, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.30)

Add feedback

819aaee144cb40e887a4aa9e781b1547-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 07:30:45 GMT

proposal, relation, spatial relation, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

76dc611d6ebaafc66cc0879c71b5db5c-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 00:16:43 GMT

information, relation, spatial relation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets

Neural Information Processing SystemsDec-24-2025, 16:01:42 GMT

Humans naturally use verbal utterances and nonverbal gestures to refer to various objects (known as $\textit{referring expressions}$) in different interactional scenarios. As collecting real human interaction datasets are costly and laborious, synthetic datasets are often used to train models to unambiguously detect relationships among objects. However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures. Additionally, while a few small-scale datasets contain multimodal cues (verbal and nonverbal), these datasets only capture the nonverbal gestures from an exo-centric perspective (observer). As models can use complementary information from multimodal cues to recognize referring expressions, generating multimodal data from multiple views can help to develop robust models.

embodied simulator, generating multimodal, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Filters

Collaborating Authors

spatial relation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

29daf9442f3c0b60642b14c081b4a556-Paper.pdf

df027cf11469e746ef94d583f9f5537f-Paper-Conference.pdf

VisMin: Visual Minimal-Change Understanding

3a7f9e485845dac27423375c934cb4db-Supplemental-Conference.pdf

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

844f722dbbcb27933ff5baf58a1f00c8-Paper-Datasets_and_Benchmarks.pdf

819aaee144cb40e887a4aa9e781b1547-Supplemental-Conference.pdf

819aaee144cb40e887a4aa9e781b1547-Paper-Conference.pdf

76dc611d6ebaafc66cc0879c71b5db5c-Paper.pdf

CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets