ISLRN

CMRC 2018 Dataset

Full Official Name: A Chinese Reading Comprehension Dataset for the 2nd Chinese Machine Reading Comprehension Evaluation (CMRC 2018)

Submission date: Oct. 22, 2018, 3:22 p.m.

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attention. However, the existing reading comprehension datasets are mostly in English. In this paper, we introduce a Span-Extraction dataset for Chinese Machine Reading Comprehension to add language diversities in this area. The dataset is composed by near 20,000 real questions annotated by human on Wikipedia paragraphs. We also annotated a challenge set which contains the questions that need comprehensive understanding and multi-sentence inference throughout the context. With the release of the dataset, we hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018).

Creator(s)

Li Xiao

iFLYTEK Research - Yiming Cui

Ting Liu

Zhipeng Chen

Shijin Wang

Guoping Hu

Wentao Ma

Distributor(s)

iFLYTEK Research - Yiming Cui

Right Holder(s)

iFLYTEK Research - Yiming Cui

Status : Accepted

ISLRN :

013-662-947-043-2

Version

v1.0

Source

https://github.com/ymcui/cmrc2018

Resource Type

Primary Text

Media Type

Text

Language(s)

Chinese

Access Medium