Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
CH-Wang, Sky, Deshpande, Darshan, Muresan, Smaranda, Kannappan, Anand, Qian, Rebecca
–arXiv.org Artificial Intelligence
We introduce Browsing Lost Unformed Recollections, a tip-of-the-tongue known-item search and reasoning benchmark for general AI assistants. BLUR introduces a set of 573 real-world validated questions that demand searching and reasoning across multi-modal and multilingual inputs, as well as proficient tool use, in order to excel on. Humans easily ace these questions (scoring on average 98%), while the best-performing system scores around 56%. To facilitate progress toward addressing this challenging and aspirational use case for general AI assistants, we release 350 questions through a public leaderboard, retain the answers to 250 of them, and have the rest as a private test set.
arXiv.org Artificial Intelligence
Mar-24-2025
- Country:
- Asia (0.68)
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment (1.00)
- Media (1.00)
- Technology: