AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning