BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications
García, Andrés Fernández, de la Rosa, Javier, Gonzalo, Julio, Morante, Roser, Amigó, Enrique, Benito-Santos, Alejandro, Carrillo-de-Albornoz, Jorge, Fresno, Víctor, Ghajari, Adrian, Marco, Guillermo, Plaza, Laura, Salido, Eva Sánchez
–arXiv.org Artificial Intelligence
The ability to summarize long documents succinctly is increasingly important in daily life due to information overload, yet there is a notable lack of such summaries for Spanish documents in general, and in the legal domain in particular. In this work, we present BOE-XSUM, a curated dataset comprising 3,648 concise, plain-language summaries of documents sourced from Spain's ``Bolet\'ın Oficial del Estado'' (BOE), the State Official Gazette. Each entry in the dataset includes a short summary, the original text, and its document type label. We evaluate the performance of medium-sized large language models (LLMs) fine-tuned on BOE-XSUM, comparing them to general-purpose generative models in a zero-shot setting. Results show that fine-tuned models significantly outperform their non-specialized counterparts. Notably, the best-performing model -- BERTIN GPT-J 6B (32-bit precision) -- achieves a 24\% performance gain over the top zero-shot model, DeepSeek-R1 (accuracies of 41.6\% vs.\ 33.5\%).
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- North America > United States (1.00)
- Europe (1.00)
- Asia (1.00)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Law (1.00)
- Government > Regional Government
- Technology: