PLIP: Language-Image Pre-training for Person Representation Learning

May-24-2025, 08:51:53 GMT–Neural Information Processing Systems

Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

May-24-2025, 08:51:53 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (0.92)
  - New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language
    - Large Language Model (0.68)
    - Text Processing (1.00)