Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Conde, Javier, González, Miguel, Reviriego, Pedro, Gao, Zhen, Liu, Shanshan, Lombardi, Fabrizio

arXiv.org Artificial Intelligence 

Unfortunately, these models are closed and can only be accessed through the user interfaces, tools or application programming interfaces provided by the companies that developed the models. Their parameters and implementation details are not publicly available and even if they were, their huge size would make their execution on commodity computing devices unfeasible. A different approach has been taken by some large companies such as Meta, i.e. the code as well as the parameters or weights of LLMs such as LLaMa