NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

May-26-2025, 22:19:50 GMT–Neural Information Processing Systems

However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. We design a highly automated construction method to ensure high-quality benchmark construction with minimal human effort, allowing flexible updates for real-time information. Empirical results on various LLMs demonstrate over 20% performance reduction caused by new terms. Additionally, while updates to the knowledge cutoff of LLMs can cover some of the new terms, they are unable to generalize to more distant new terms.

large language model, natural language, real time system, (7 more...)

Neural Information Processing Systems

May-26-2025, 22:19:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Architecture > Real Time Systems (1.00)
  - Artificial Intelligence > Natural Language
    - Large Language Model (1.00)