Towards Low-bit Communication for Tensor Parallel LLM Inference