Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization