GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures
–arXiv.org Artificial Intelligence
Risk categorization in 10-K risk disclosures matters for oversight and investment, yet no public benchmark evaluates unsupervised topic models for this task. We present GRAB, a finance-specific benchmark with 1.61M sentences from 8,247 filings and span-grounded sentence labels produced without manual annotation by combining FinBERT token attention, YAKE keyphrase signals, and taxonomy-aware collocation matching. Labels are anchored in a risk taxonomy mapping 193 terms to 21 fine-grained types nested under five macro classes; the 21 types guide weak supervision, while evaluation is reported at the macro level. GRAB unifies evaluation with fixed dataset splits and robust metrics--Accuracy, Macro-F1, Topic BERTScore, and the entropy-based Effective Number of Topics. The dataset, labels, and code enable reproducible, standardized comparison across classical, embedding-based, neural, and hybrid topic models on financial disclosures.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Middle East > Jordan (0.04)
- China > Beijing
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- France (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Croatia > Dubrovnik-Neretva County
- North America > United States
- Pennsylvania (0.04)
- Asia
- Genre:
- Research Report (0.65)
- Industry:
- Banking & Finance > Trading (0.47)
- Law > Business Law (0.70)
- Technology: