LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!

Sundaram, Jainaveen, Iyer, Ravi

arXiv.org Artificial Intelligence 

Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. While closed source models such as GPT-4o/Claude/Gemini are topping leaderboards [1], LLaVa [2] and its variants remain some of the best open-source MM-LLMs with a strong adoption by the developer community. To truly democratize AI, apart from strong capabilities, models must run efficiently on small compute footprints accessible by most. Small Language Models (SLMs) address this gap, where number of parameters are scaled down (typically <3B) while keeping architecture choices and pre-training token exposure close to their larger counterparts. As a result, models such as Phi [3], Gemma-2b [4] amd Olmo [5] exhibit strong performance across benchmarks with a smaller memory footprint and lower compute latency. Weight Quantization offers an additional knob to further shrink model sizes while balancing performance.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found