V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms

Rodrigo, Javier J. Poveda, Ahmdi, Mohamed Amine, Burrello, Alessio, Pagliari, Daniele Jahier, Benini, Luca

Mar-21-2025–arXiv.org Artificial Intelligence

The recent exponential growth of Large Language Models (LLMs) has relied on GPU-based systems. However, CPUs are emerging as a flexible and lower-cost alternative, especially when targeting inference and reasoning workloads. RISC-V is rapidly gaining traction in this area, given its open and vendor-neutral ISA. However, the RISC-V hardware for LLM workloads and the corresponding software ecosystem are not fully mature and streamlined, given the requirement of domain-specific tuning. This paper aims at filling this gap, focusing on optimizing LLM inference on the Sophon SG2042, the first commercially available many-core RISC-V CPU with vector processing capabilities. On two recent state-of-the-art LLMs optimized for reasoning, DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B, we achieve 4.32/2.29 token/s for token generation and 6.54/3.68 token/s for prompt processing, with a speed up of up 2.9x/3.0x compared to our baseline.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-21-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - Switzerland > Zürich
    - Zürich (0.15)
  - Italy > Piedmont
    - Turin Province > Turin (0.09)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.59)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found