GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures

Sep-29-2025–arXiv.org Artificial Intelligence

Risk categorization in 10-K risk disclosures matters for oversight and investment, yet no public benchmark evaluates unsupervised topic models for this task. We present GRAB, a finance-specific benchmark with 1.61M sentences from 8,247 filings and span-grounded sentence labels produced without manual annotation by combining FinBERT token attention, YAKE keyphrase signals, and taxonomy-aware collocation matching. Labels are anchored in a risk taxonomy mapping 193 terms to 21 fine-grained types nested under five macro classes; the 21 types guide weak supervision, while evaluation is reported at the macro level. GRAB unifies evaluation with fixed dataset splits and robust metrics--Accuracy, Macro-F1, Topic BERTScore, and the entropy-based Effective Number of Topics. The dataset, labels, and code enable reproducible, standardized comparison across classical, embedding-based, neural, and hybrid topic models on financial disclosures.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.68)

Genre:
- Research Report (0.65)

Industry:
- Law > Business Law (0.70)
- Banking & Finance > Trading (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Discourse & Dialogue (0.56)
  - Machine Learning > Statistical Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found