Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Lee, Hyun Seung, Choi, Seungtaek, Lee, Yunsung, Moon, Hyeongdon, Oh, Shinhyeok, Jeong, Myeongho, Go, Hyojun, Wallraven, Christian

May-30-2023–arXiv.org Artificial Intelligence

Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

May-30-2023

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Text Classification (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found