Resource: CMRC 2019 Dataset

Reference A Chinese Reading Comprehension Dataset for the 3rd Chinese Machine Reading Comprehension Evaluation (CMRC 2019)
Date of Submission July 7, 2020, 3:24 p.m.
Status accepted
ISLRN 813-010-842-493-2
Resource Type Primary Text
Media Type Text
Language Chinese

We propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin.

Version v1.0
Creator Yiming Cui - iFLYTEK Research , Ting Liu , Zhipeng Chen , Shijin Wang , Guoping Hu , Wentao Ma , Ziqing Yang , Wanxiang Che
Distributor Yiming Cui - iFLYTEK Research
Rights Holder Yiming Cui - iFLYTEK Research