Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases