DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads