Actor-agnostic Multi-label Action Recognition with Multi-modal Query