xxx
TextDiffuser: Diffusion Models as Text Painters
Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality.
Position-basedScaledGradientforModel QuantizationandPruning-Appendix
Inthis experiment, we only quantize the weights, not the activations, to compare the performance degradation as weight bit-width decreases. The mean squared errors (MSE) of the weights across different bit-widths are also reported. The name of the layer and the number of parameters in parenthesis are shown in the column. All numbers are results of the last epoch. Table A3: ResNet-32 trained with Adam on the CIFAR-100 dataset.
4aa13186c795a52ba88f5b822f4b77eb-Paper-Conference.pdf
Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This isverychallenging, however,asthedata distribution can change inflexible ways, and we may not haveanylabels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and afew features.
45c166d697d65080d54501403b433256-AuthorFeedback.pdf
The reviewers2 acknowledge that the ideas presented inthe paper are compelling, sound and appear tobeeffective(R3), offering a3 great add to the GP literature (R1) which is also supported by a solid and an interesting theoretical foundation (R2,4 R4). Existing multi-output GP models are not applicable to our setting (see line 79-83) and are thus not16 comparabletotheDAG-GPmodel. Wehavefurther clarified this point in Section 1.2.