HADES: Hardware Accelerated Decoding for Efficient Speculation in Large Language Models

Open in new window