An attention-driven hierarchical multi-scale representation for visual recognition