DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Open in new window