More Context, Less Distraction: Zero-shot Visual Classification by Inferring and Conditioning on Contextual Attributes