WritingBench: AComprehensive Benchmark for Generative Writing

Jun-17-2026, 02:04:55 GMT–Neural Information Processing Systems

Recent advancements in large language models (LLMs) have significantly enhanced text generation capabilities, yet evaluating their performance in generative writing remains a challenge. Existing benchmarks primarily focus on generic text generation or limited in writing tasks, failing to capture the diverse requirements of high-quality written contents across various domains. To bridge this gap, we present WritingBench, a comprehensive benchmark designed to evaluate LLMs across 6 core writing domains and 100 subdomains.We further propose a querydependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria. This framework is complemented by a finetuned critic model for criteria-aware scoring, enabling evaluations in style, format and length. The framework's validity is further demonstrated by its data curation capability, which enables a 7B-parameter model to outperform the performance of GPT-4o in writing. We open-source the benchmark, along with evaluation tools and modular framework components, to advance the development of LLMs in writing.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Jun-17-2026, 02:04:55 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.93)
- Europe > Austria (0.28)
- North America
  - United States (0.46)
  - Mexico (0.28)

Genre:
- Overview (0.93)
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Law (1.00)
- Education (1.00)
- Information Technology > Security & Privacy (0.93)
- Banking & Finance (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found