TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

Ruiz, Alfredo Garrachón, de la Rosa, Tomás, Borrajo, Daniel

arXiv.org Artificial Intelligence 

This Large language models (LLMs) have shown remarkable approach is orthogonal to other optimization techniques, capabilities across a wide range of tasks, and could be applicable as LLMs continue from natural language understanding to creative to grow in size and capabilities. We also propose content generation. However, the computational an algorithm to check and define the applicability cost of inference and the associated energy consumption of this technique in different domains, selecting present significant challenges. As the demand the most proper function words set, and analyzing for AI applications continues to grow, these the lose in performance as the percentage of costs are expected to escalate, raising concerns saved tokens increases. Additionally, we provide about sustainability and accessibility (Wu et al., an experimental evaluation in the context of general 2022).