supplementary
Supplementary for Paper2Poster: Benchmarking Multimodal Poster Automation from Scientific Papers
AAblation Study1 We conduct ablation studies to evaluate three key design choices in PosterAgent: (1) the binary-tree2 layout strategy for layout planning; (2) the inclusion of a commenter module as a visual critic; and3 (3) the use of in-context examples to enhance the visual perception capabilities of the commenter.4 We define the following variants:5 Direct: replacing the binary-tree layout with direct layout generation by an LLM;6 Tree: using the binary-tree layout strategy but removing the commenter module;7 Tree + Commenter: including the commenter module but without in-context examples;8 Tree + Commenter + IC: the full system, with both the commenter and in-context examples.9 All ablation variants are implemented using PosterAgent-4o, keeping all other components un-10 changed to isolate the effect of each factor. We visualize and compare results across five randomly11 selected papers from Paper2Poster, as shown in Figures 1 to 5.12 When prompting the LLM to directly generate poster layouts (Direct), the results are often structurally13 compromised (e.g., Figures 1a-3a), or resemble blog-style layouts that lack visual hierarchy and14 appeal (Figures 4a,5a). Fine-grained layout components, such as text boxes and figures, are especially15 challenging to synthesize in this setting: for instance, Figures1a-4a exhibit missing text boxes that16 leave noticeable blank areas, and Figure 4a fails to preserve the correct aspect ratio of figures.17
Supplementary to Smooth Bilevel Programming for Sparse Regularization Clarice Poon, Gabriel Peyrรฉ APseudocode for gradient descent implementation
Note that f(ฮฒt) = gt is computed either as in line 5 or line 9 of the algorithm and one can use these computations for any gradient based algorithm (e.g. Note also that this is simply gradient descent on a smooth function, and one can apply typical methods to choosing the stepsize ฮณk, such as the Barzilai-Borwein stepsize [Barzilai and Borwein, 1988]. Algorithm 1: Gradient descent implementation of Ncvx-Pro for solving Lasso. 1 initialization v0 Rn (with no zero entries), stepsize ฮณt > 0; Result: ฮฒt 2 while not converged do 3 if n6 mand ฮป>0 then 4 ut = diag(vt)X>Xdiag(vt) + ฮปId To show that i) implies ii), recall that a convex, proper and lower semicontinuous function ฯ can be written in terms of its convex conjugate which has domain Rd . For the expression of ฯwhen Ris a norm,from the above, we know that ฯ = ( ฯ) ( z), and recall that for any norm, R(ฮฒ) = maxR (w)61hw, ฮฒi. We derive some properties of the function h: Lemma 1.
Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark
We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).
On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration.
Supplementary: Non-Local Latent Relation Distillation for Self-Adaptive 3DHuman Pose Estimation
The raw video frames are forwarded through a person-detector [15] to obtain the person-focused image sequences. Note that, the detector pruned video sequences may not have a smooth pixel transition. However, it retains the smooth pose transition at the view-variant root-relative system. In our work, the shared latent pose can be seen as a parametric form to represent plausible 3D poses. And, the image-to-latent model is trained to regress the latent pose parameters with latent being an intermediate 3D pose representation.