A Unified View of Evaluation Metrics for Structured Prediction
Chen, Yunmo, Gantt, William, Chen, Tongfei, White, Aaron Steven, Van Durme, Benjamin
–arXiv.org Artificial Intelligence
We present a conceptual framework that unifies a variety of evaluation metrics for different structured prediction tasks (e.g. event and relation extraction, syntactic and semantic parsing). Our framework requires representing the outputs of these tasks as objects of certain data types, and derives metrics through matching of common substructures, possibly followed by normalization. We demonstrate how commonly used metrics for a number of tasks can be succinctly expressed by this framework, and show that new metrics can be naturally derived in a bottom-up way based on an output structure. We release a library that enables this derivation to create new metrics. Finally, we consider how specific characteristics of tasks motivate metric design decisions, and suggest possible modifications to existing metrics in line with those motivations.
arXiv.org Artificial Intelligence
Oct-20-2023
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Texas (0.04)
- New York (0.04)
- Virginia > Fairfax County
- McLean (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Maryland
- Howard County > Columbia (0.04)
- Baltimore (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Germany > Berlin (0.04)
- Spain (0.04)
- United Kingdom (0.04)
- Ireland (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Bulgaria > Sofia City Province
- Sofia (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.04)
- Middle East > Iraq
- Nineveh Governorate > Mosul (0.04)
- North America
- Genre:
- Research Report (0.40)
- Industry:
- Technology: