FIRP: Faster LLM inference via future intermediate representation prediction

Open in new window