AITopics | textharmony

Harmonizing Visual Text Comprehension and Generation

Neural Information Processing SystemsMar-22-2026, 02:37:47 GMT

Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervised fine-tuning, necessitating distinct model instances. We propose Slide-LoRA, which dynamically aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space.

artificial intelligence, name change, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 12:25:19 GMT

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
(2 more...)

Add feedback

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 13:30:22 GMT

arxiv preprint arxiv, slide-lora, textharmony, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
(2 more...)

Add feedback

Harmonizing Visual Text Comprehension and Generation

Neural Information Processing SystemsMay-27-2025, 13:02:25 GMT

Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervised fine-tuning, necessitating distinct model instances. We propose Slide-LoRA, which dynamically aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space. Additionally, we develop a high-quality image caption dataset, DetailedTextCaps-100K, synthesized with a sophisticated closed-source MLLM to enhance visual text generation capabilities further. Comprehensive experiments across various benchmarks demonstrate the effectiveness of the proposed approach.

harmonizing visual text comprehension, textharmony, visual text comprehension and generation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.46)

Add feedback

Filters

Collaborating Authors

textharmony

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Harmonizing Visual Text Comprehension and Generation

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

b0ca717599b7ba84d5e4f4c8b1ef6657-Paper-Conference.pdf

Harmonizing Visual Text Comprehension and Generation