CMRC 2018 Dataset

Full Official Name: A Chinese Reading Comprehension Dataset for the 2nd Chinese Machine Reading Comprehension Evaluation (CMRC 2018)
Submission date: Oct. 22, 2018, 3:22 p.m.

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese Machine Reading Comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated by human on Wikipedia paragraphs. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).

Creator(s)
Distributor(s)
Right Holder(s)