PLIP: Language-Image Pre-training for Person Representation Learning
–Neural Information Processing Systems
Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases.
Neural Information Processing Systems
May-24-2025, 08:51:53 GMT
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.93)
- Research Report
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: