Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification
Bucher, Martin Juan José, Martini, Marco
–arXiv.org Artificial Intelligence
Generative AI offers a simple, prompt-based alternative to fine-tuning smaller BERT-style LLMs for text classification tasks. This promises to eliminate the need for manually labeled training data and task-specific model training. However, it remains an open question whether tools like ChatGPT can deliver on this promise. In this paper, we show that smaller, fine-tuned LLMs (still) consistently and significantly outperform larger, zero-shot prompted models in text classification. We compare three major generative AI models (ChatGPT with GPT-3.5/GPT-4 and Claude Opus) with several fine-tuned LLMs across a diverse set of classification tasks (sentiment, approval/disapproval, emotions, party positions) and text categories (news, tweets, speeches). We find that fine-tuning with application-specific training data achieves superior performance in all cases. To make this approach more accessible to a broader audience, we provide an easy-to-use toolkit alongside this paper. Our toolkit, accompanied by non-technical step-by-step guidance, enables users to select and fine-tune BERT-like LLMs for any classification task with minimal technical and computational effort.
arXiv.org Artificial Intelligence
Jun-12-2024
- Country:
- North America
- Canada (0.04)
- United States > California
- Santa Clara County > Palo Alto (0.04)
- Europe
- United Kingdom (0.04)
- Germany (0.04)
- Austria (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- Slovakia > Bratislava
- Bratislava (0.04)
- North America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law (0.93)
- Government > Regional Government
- North America Government > United States Government (0.93)
- Europe Government (0.68)
- Technology: