Counterfactual Evolution of Multimodal Datasets via Visual Programming

Jun-18-2026, 14:02:18 GMT–Neural Information Processing Systems

The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes--reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPEimproves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.

large language model, machine learning, programming language, (21 more...)

Neural Information Processing Systems

Jun-18-2026, 14:02:18 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.87)

Technology:
- Information Technology
  - Software > Programming Languages (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Machine Learning > Neural Networks (0.93)
    - Natural Language > Large Language Model (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found