Supplementary for Paper2Poster: Benchmarking Multimodal Poster Automation from Scientific Papers
–Neural Information Processing Systems
AAblation Study1 We conduct ablation studies to evaluate three key design choices in PosterAgent: (1) the binary-tree2 layout strategy for layout planning; (2) the inclusion of a commenter module as a visual critic; and3 (3) the use of in-context examples to enhance the visual perception capabilities of the commenter.4 We define the following variants:5 Direct: replacing the binary-tree layout with direct layout generation by an LLM;6 Tree: using the binary-tree layout strategy but removing the commenter module;7 Tree + Commenter: including the commenter module but without in-context examples;8 Tree + Commenter + IC: the full system, with both the commenter and in-context examples.9 All ablation variants are implemented using PosterAgent-4o, keeping all other components un-10 changed to isolate the effect of each factor. We visualize and compare results across five randomly11 selected papers from Paper2Poster, as shown in Figures 1 to 5.12 When prompting the LLM to directly generate poster layouts (Direct), the results are often structurally13 compromised (e.g., Figures 1a-3a), or resemble blog-style layouts that lack visual hierarchy and14 appeal (Figures 4a,5a). Fine-grained layout components, such as text boxes and figures, are especially15 challenging to synthesize in this setting: for instance, Figures1a-4a exhibit missing text boxes that16 leave noticeable blank areas, and Figure 4a fails to preserve the correct aspect ratio of figures.17
Neural Information Processing Systems
Jun-15-2026, 03:27:31 GMT
- Technology: