Yuquan Wang
2023
Distantly Supervised Course Concept Extraction in MOOCs with Academic Discipline
Mengying Lu
|
Yuquan Wang
|
Jifan Yu
|
Yexing Du
|
Lei Hou
|
Juanzi Li
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the rapid growth of Massive Open Online Courses (MOOCs), it is expensive and time-consuming to extract high-quality knowledgeable concepts taught in the course by human effort to help learners grasp the essence of the course. In this paper, we propose to automatically extract course concepts using distant supervision to eliminate the heavy work of human annotations, which generates labels by matching them with an easily accessed dictionary. However, this matching process suffers from severe noisy and incomplete annotations because of the limited dictionary and diverse MOOCs. To tackle these challenges, we present a novel three-stage framework DS-MOCE, which leverages the power of pre-trained language models explicitly and implicitly and employs discipline-embedding models with a self-train strategy based on label generation refinement across different domains. We also provide an expert-labeled dataset spanning 20 academic disciplines. Experimental results demonstrate the superiority of DS-MOCE over the state-of-the-art distantly supervised methods (with 7% absolute F1 score improvement). Code and data are now available at https://github.com/THU-KEG/MOOC-NER.
2020
MOOCCube: A Large-scale Data Repository for NLP Applications in MOOCs
Jifan Yu
|
Gan Luo
|
Tong Xiao
|
Qingyang Zhong
|
Yuquan Wang
|
Wenzheng Feng
|
Junyi Luo
|
Chenyu Wang
|
Lei Hou
|
Juanzi Li
|
Zhiyuan Liu
|
Jie Tang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
The prosperity of Massive Open Online Courses (MOOCs) provides fodder for many NLP and AI research for education applications, e.g., course concept extraction, prerequisite relation discovery, etc. However, the publicly available datasets of MOOC are limited in size with few types of data, which hinders advanced models and novel attempts in related topics. Therefore, we present MOOCCube, a large-scale data repository of over 700 MOOC courses, 100k concepts, 8 million student behaviors with an external resource. Moreover, we conduct a prerequisite discovery task as an example application to show the potential of MOOCCube in facilitating relevant research. The data repository is now available at http://moocdata.cn/data/MOOCCube.