AITopics | image token

Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing

Neural Information Processing SystemsJun-20-2026, 23:00:53 GMT

Large Vision-Language Models (LVLMs) with discrete image tokenizers unify multimodal representations by encoding visual inputs into a finite set of tokens. Despite their effectiveness, we find that these models still hallucinate non-existent objects. We hypothesize that this may be due to visual priors induced during training: When certain image tokens frequently co-occur in the same spatial regions and represent shared objects, they become strongly associated with the verbalizations of those objects.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models

Neural Information Processing SystemsJun-18-2026, 04:52:11 GMT

Recent advances in multimodal training have significantly improved the integration of image understanding and generation within a unified model. This study investigates how vision-language models (VLMs) handle image-understanding tasks, focusing on how visual information is processed and transferred to the textual domain. We compare native multimodal VLMs, models trained from scratch on multimodal data to generate both text and images, and non-native multimodal VLMs, models adapted from pre-trained large language models or capable of generating only text, highlighting key differences in information flow. We find that in native multimodal VLMs, image and text embeddings are more separated within the residual stream. Moreover, VLMs differ in how visual information reaches text: non-native multimodal VLMs exhibit a distributed communication pattern, where information is exchanged through multiple image tokens, whereas models trained natively for joint image and text generation tend to rely on a single post-image token that acts as a narrow gate for visual information. We show that ablating this single token significantly deteriorates image-understanding performance, whereas targeted, token-level interventions reliably steer image semantics and downstream text with fine-grained control.

large language model, machine learning, multimodal model, (21 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.88)

Add feedback

996e2b446391fcb8bf32a3d1645cc799-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 01:59:17 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Neural Information Processing SystemsFeb-15-2026, 15:15:35 GMT

With high quality image generation achieved, the next critical step is to enhance user controllability.

diffusion model, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

7dd309df03d37643b96f5048b44da798-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:31:41 GMT

artificial intelligence, class identifiability evolution, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Newfoundland and Labrador > Newfoundland (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

7dd309df03d37643b96f5048b44da798-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:31:37 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

cd062f8003e38f55dcb93df55b2683d6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 23:12:31 GMT

concept token, representation, visual concept, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

2d9c6cdb4cfe93869c090fea7375044b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 10:28:25 GMT

arxiv preprint arxiv, information, modeling, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

2492288f6878e6f99124b362604e58f5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:56:08 GMT

information, selection token, tag entity, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > Canada > Ontario > Toronto (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

A Training details

Neural Information Processing SystemsFeb-8-2026, 12:07:04 GMT

Models were trained with 32 experts, with experts placed every 2 layers - except where explicitly stated. The learned contrastive temperature parameter is initialised at 10. We train models at batch size 16,384 for 781,250 steps at resolution 224. These are B/16 models trained for 100,000 steps at batch size 8192. The default training data is mixed with data from JFT -4B with a ratio of 3:1.

artificial intelligence, machine learning, text token, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

image token

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing

The Narrow Gate: Localized Image-Text Communication in Native Multimodal Models

996e2b446391fcb8bf32a3d1645cc799-Paper-Conference.pdf

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

7dd309df03d37643b96f5048b44da798-Supplemental-Conference.pdf

7dd309df03d37643b96f5048b44da798-Paper-Conference.pdf

cd062f8003e38f55dcb93df55b2683d6-Paper-Conference.pdf

2d9c6cdb4cfe93869c090fea7375044b-Paper-Conference.pdf

2492288f6878e6f99124b362604e58f5-Paper-Conference.pdf

A Training details