Language Models for Semantic Extraction and Filtering in Video Action Recognition