Seeing What You're Told: Sentence-Guided Activity Recognition In Video