Law
Towards Test-Time Refusals via Concept Negation Peiran Dong 1 Song Guo 2 Junxiao Wang 3 Bingjie Wang
Here is a breakdown of the three steps involved: 1) Prototype: We utilize CLIP to encode a collection of text prompts obtained from social media platforms that express similar negative concepts. These encoded features are then aggregated into a comprehensive prototype feature, capturing the semantics of the negative concepts.