Large Language Model
ExploringtheLimitsofDomain-AdaptiveTrainingfor DetoxifyingLarge-ScaleLanguageModels
Wethen comprehensively study detoxifying LMswithparameter sizesranging from126Mupto530B(3 largerthanGPT3), a scale that has never been studied before. We find thati) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. Wealso explore parameter-efficient training methods fordetoxification.
VeLoRA: MemoryEfficientTrainingusing Rank-1Sub-TokenProjections
Using a single projection vector, we then project these individual sub-tokens onto a one-dimensional subspace. Importantly, we notice that we can initialize this projection vector cheaply using first-order batch statistics andthen keepitfixedthroughout training. Wethen reconstruct the original tokens using the same vector during the backward pass.
Supplementary Material: Segment Anything in High Quality
In this supplementary material, Section 1 first presents the additional experimental analysis of our HQ-SAM, including more zero-shot transfer comparisons to SAM on both image and video benchmarks. SAM vs. HQ-SAM on V arious Backbones In Table 1, we provide a comprehensive comparison Table 2: Results on Y ouTubeVIS 2019 validation set and HQ-YTVIS test set using ViT -L based SAM. In Table 2, HQ-SAM achieves consistent gains of 1.4 points in Tube Mask AP, Robustness to Input Box Prompts In Table 4, we compare HQ-SAM to SAM by adding various scales of noises to the input ground truth box prompts. "center" point of Ground Truth (GT) masks, which is at a maximal value location in a mask's interior Results not obtained in a zero-shot manner (i.e. the training HQ-SAM improves over SAM, but still cannot achieve fully correct mask prediction. HQ-SAM produces significantly more accurate boundaries.