Resume Information Extraction via Post-OCR Text Processing
Helli, Selahattin Serdar, Tanberk, Senem, Cavsak, Sena Nur
–arXiv.org Artificial Intelligence
Information extraction (IE), one of the main tasks of natural language processing (NLP), has recently increased importance in the use of resumes. In studies on the text to extract information from the CV, sentence classification was generally made using NLP models. In this study, it is aimed to extract information by classifying all of the text groups after pre-processing such as Optical Character Recognition (OCT) and object recognition with the YOLOv8 model of the resumes. The text dataset consists of 286 resumes collected for 5 different (education, experience, talent, personal and language) job descriptions in the IT industry. The dataset created for object recognition consists of 1198 resumes, which were collected from the open-source internet and labeled as sets of text. BERT, BERT-t, DistilBERT, RoBERTa and XLNet were used as models. F1 score variances were used to compare the model results. In addition, the YOLOv8 model has also been reported comparatively in itself. As a result of the comparison, DistilBERT was showed better results despite having a lower number of parameters than other models.
arXiv.org Artificial Intelligence
Jun-23-2023
- Country:
- Europe > Middle East
- Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia
- Macao (0.04)
- India (0.04)
- China (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Afghanistan > Kabul Province
- Kabul (0.04)
- Europe > Middle East
- Genre:
- Research Report (0.70)
- Technology:
- Information Technology
- Data Science > Data Mining
- Text Mining (0.71)
- Artificial Intelligence
- Vision > Optical Character Recognition (0.55)
- Natural Language
- Text Processing (0.87)
- Information Extraction (0.71)
- Machine Translation (0.69)
- Machine Learning
- Pattern Recognition (0.67)
- Neural Networks > Deep Learning (0.47)
- Data Science > Data Mining
- Information Technology