Feedback-based Modal Mutual Search for Attacking Vision-Language Pre-training Models

Open in new window