Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition

Cohen, Danielle, Halpern, Yoni, Kahlon, Noam, Oren, Joel, Berkovitch, Omri, Caduri, Sapir, Dagan, Ido, Efros, Anatoly

arXiv.org Artificial Intelligence 

Understanding user intents from UI interaction trajectories remains a challenging, yet crucial, frontier in intelligent agent development. While massive, datacenter-based, multi-modal large language models (MLLMs) possess greater capacity to handle the complexities of such sequences, smaller models which can run on-device to provide a privacy-preserving, low-cost, and low-latency user experience, struggle with accurate intent inference. We address these limitations by introducing a novel decomposed approach: first, we perform structured interaction summarization, capturing key information from each user action. Second, we perform intent extraction using a fine-tuned model operating on the aggregated summaries. This method improves intent understanding in resource-constrained models, even surpassing the base performance of large MLLMs.