Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models
–Neural Information Processing Systems
Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants.
Neural Information Processing Systems
May-26-2025, 18:23:47 GMT
- Industry:
- Information Technology > Services
- e-Commerce Services (1.00)
- Retail > Online (1.00)
- Information Technology > Services
- Technology: