LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!

Aug-30-2024–arXiv.org Artificial Intelligence

Multimodal Large Language Models (MM-LLMs) have seen significant advancements in the last year, demonstrating impressive performance across tasks. While closed source models such as GPT-4o/Claude/Gemini are topping leaderboards [1], LLaVa [2] and its variants remain some of the best open-source MM-LLMs with a strong adoption by the developer community. To truly democratize AI, apart from strong capabilities, models must run efficiently on small compute footprints accessible by most. Small Language Models (SLMs) address this gap, where number of parameters are scaled down (typically <3B) while keeping architecture choices and pre-training token exposure close to their larger counterparts. As a result, models such as Phi [3], Gemma-2b [4] amd Olmo [5] exhibit strong performance across benchmarks with a smaller memory footprint and lower compute latency. Weight Quantization offers an additional knob to further shrink model sizes while balancing performance.

arxiv preprint arxiv, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

Aug-30-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found