RTP: Rethinking Tensor Parallelism with Memory Deduplication

Open in new window