Arabic Cross-Document NLP for the Hadith and Biography Literature
Zaraket, Fadi (American University of Beirut) | Makhlouta, Jad (American University of Beirut)
Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents $A$, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities N a and relations R a expressed as edges in a graph G = ( N a, R a ). We use the same techniques to extract entities N b and relations R b from a separate set of documents B. We use G to disambiguate N b and R and we integrate the resulting entities back into G by annotating the nodes and edges in G with elements from N b . We apply our approach in an iterative manner. Our results show a significant increase in accuracy from 41% to 93% after applying this cross-document NLP methodology to hadith and biography documents.
May-20-2012
- Country:
- Asia > Middle East
- Iraq (0.04)
- Lebanon > Beirut Governorate
- Beirut (0.04)
- Europe > Slovenia (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.54)
- Industry:
- Education (0.46)
- Technology: