Resource Mention Extraction for MOOC Discussion Forums
An, Ya-Hui, Pan, Liangming, Kan, Min-Yen, Dong, Qiang, Fu, Yan
–arXiv.org Artificial Intelligence
In discussions hosted on discussion forums for Massive Online Open Courses (MOOCs), references to online learning resources are often of central importance. However they are usually mentioned in free text, without appropriate hyperlinking to their associated resource. Automated learning resource mention hyperlinking and categorization will facilitate discussion and searching within MOOC forums, and also benefit the contextualization of such resources across disparate views. We propose the novel problem of learning resource mention identification inMOOC forums; i.e., to identify resource mentions in discussions, and classify them into predefined resource types. As this is a novel task with no publicly available data, we first contribute a large-scale labeled dataset - dubbed the Forum Resource Mention (FoRM) dataset - to facilitate our current research and future research on this task. FoRM contains over 10, 000 real-world forum threads in collaboration with Coursera, with more than 23, 000 manually labeled resource mentions. We then formulate this task as a sequence tagging problem and investigate solutionarchitectures to address the problem. Corresponding author Email address: peterpan10211020@gmail.com (Liangming Pan) Preprint submitted to Elsevier November 22, 2018 two major challenges that hinder the application of sequence tagging models tothe task: (1) the diversity of resource mention expression, and (2) long-range contextual dependencies. We address these challenges by incorporating character-leveland thread context information into a LSTM-CRF model. First, we incorporate a character encoder to address the out-ofvocabulary problemcaused by the diversity of mention expressions. Second, to address the context dependency challenge, we encode thread contexts using anRNN-based context encoder, and apply the attention mechanism to selectively leverage useful context information during sequence tagging. Experiments onFoRM show that the proposed method improves the baseline deep sequence tagging models notably, significantly bettering performance on instances that exemplify the two challenges.
arXiv.org Artificial Intelligence
Nov-21-2018
- Country:
- Asia
- China > Sichuan Province
- Chengdu (0.04)
- Singapore > Central Region
- Singapore (0.04)
- China > Sichuan Province
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Asia
- Genre:
- Instructional Material
- Course Syllabus & Notes (1.00)
- Online (1.00)
- Research Report (1.00)
- Instructional Material
- Industry: