Assigning Species Information to Corresponding Genes by a Sequence Labeling Framework
Luo, Ling, Wei, Chih-Hsuan, Lai, Po-Ting, Chen, Qingyu, Doğan, Rezarta Islamaj, Lu, Zhiyong
–arXiv.org Artificial Intelligence
The automatic assignment of species information to the corresponding genes in a research article is a critically important step in the gene normalization task, whereby a gene mention is normalized and linked to a database record or identifier by a text-mining algorithm. Existing methods typically rely on heuristic rules based on gene and species co-occurrence in the article, but their accuracy is suboptimal. We therefore developed a high-performance method, using a novel deep learning-based framework, to classify whether there is a relation between a gene and a species. Instead of the traditional binary classification framework in which all possible pairs of genes and species in the same article are evaluated, we treat the problem as a sequence-labeling task such that only a fraction of the pairs needs to be considered. Our benchmarking results show that our approach obtains significantly higher performance compared to that of the rule-based baseline method for the species assignment task (from 65.8% to 81.3% in accuracy).
arXiv.org Artificial Intelligence
May-8-2022
- Country:
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Technology: