Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification