Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency
Hossain, Sazzad, Seyam, Touhidul Alam, Chowdhury, Avijit, Xamidov, Munis, Ghose, Rajib, Pathak, Abhijit
–arXiv.org Artificial Intelligence
This paper conducts a comparative investigation to maximize the effectiveness of Llama2 inference, a critical task in machine learning and natural language processing (NLP). Various programming languages and frameworks, including TensorFlow, PyTorch, Python, Mojo, C++, and Java, are examined, assessing their speed, memory consumption, and ease of implementation through extensive testing and benchmarking. The advantages and disadvantages of each strategy are noted, with suggested optimization methods for parallel processing and hardware utilization. Additionally, the performance of the Mojo SDK, a novel framework designed for LLM inference on Apple Silicon, is investigated, comparing it against established implementations in C, C++, Rust, Zig, Go, and Julia. Through comprehensive benchmarking on an Apple M1 Max, Mojo SDK's competitive performance and its advantages in ease of use and Python compatibility are demonstrated, suggesting it is a compelling alternative for LLM inference on Apple Silicon. Implications for the future of LLM deployment on resource-limited hardware and potential avenues for further research is discussed.
arXiv.org Artificial Intelligence
Jan-30-2025
- Country:
- Asia
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.04)
- China (0.04)
- Russia (0.04)
- Bangladesh > Dhaka Division
- Europe > Russia
- Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.69)
- Technology: