Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition