Symbol-based entity marker highlighting for enhanced text mining in materials science with generative AI
Lee, Junhyeong, Yuk, Jong Min, Lee, Chan-Woo
–arXiv.org Artificial Intelligence
The construction of experimental datasets is essential for expanding the scope of data-driven scientific discovery. Recent adva nces in natural language pro cessing (NLP) have facilitated automatic extraction of structured data from uns tructured scientific literature. While existing approaches--multi-step and direct methods--offer va luable capabilities, they also come with limitations when applied independently. He re, we propose a novel hybrid text-mining framework that integrates the advantages of both methods to convert unstructured scientific text into structured data. Our approach first tran sforms raw text into entity-recognized text, and subsequently into structured form. Furthermore, beyond the overall data structuring framework, we also enhance entity recogniti on performance by introducing an entity marker--a simple yet effective technique that uses sym bolic annotations to highlight target entities. Specifically, our entity marker-based hybrid approach not onl y consistently outperforms previous entity recognition approaches across three benchmark datasets (MatScholar, SOFC, and SOFC slot NER) but also improve the quality of final st ructured data--yielding up to a 58% improvement in entity-level F1 score and up to 83% improveme nt in relation-level F1 score compared to direct approach.
arXiv.org Artificial Intelligence
May-12-2025
- Country:
- Genre:
- Research Report (0.64)
- Workflow (0.69)
- Industry:
- Energy (1.00)
- Technology: