The supplementary materials includes a detailed description of implementation details for experiments
–Neural Information Processing Systems
We use BLIP-2 models built on the FLAN-T5 language model family. We use the same padding side as the FLAN-T5 models. We use a batch size of 8 for all datasets and models. The Q-former is kept in full precision. To produce decompositions, we use multinomial beam search sampling with 5 beams and a top-p of 0.95.
Neural Information Processing Systems
Feb-16-2026, 14:40:27 GMT
- Technology: