RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

Neural Information Processing Systems 

To address this gap, we propose Relational Language-Image Pre-training (RLIP), a strategy for contrastive pre-training that leverages both entity and relation descriptions.