ISLRN

CMRC 2019 Dataset

Full Official Name: A Chinese Reading Comprehension Dataset for the 3rd Chinese Machine Reading Comprehension Evaluation (CMRC 2019)

Submission date: July 7, 2020, 3:24 p.m.

We propose a new task called Sentence Cloze-style Machine Reading Comprehension (SC-MRC). The proposed task aims to fill the right candidate sentence into the passage that has several blanks. Moreover, to add more difficulties, we also made fake candidates that are similar to the correct ones, which requires the machine to judge their correctness in the context. The proposed dataset contains over 100K blanks (questions) within over 10K passages, which was originated from Chinese narrative stories. To evaluate the dataset, we implement several baseline systems based on pre-trained models, and the results show that the state-of-the-art model still underperforms human performance by a large margin.

Creator(s)

Ziqing Yang

Wanxiang Che

iFLYTEK Research - Yiming Cui

Ting Liu

Zhipeng Chen

Shijin Wang

Guoping Hu

Wentao Ma

Distributor(s)

iFLYTEK Research - Yiming Cui

Right Holder(s)

iFLYTEK Research - Yiming Cui

Status : Accepted

ISLRN :

813-010-842-493-2

Version

v1.0

Source

https://github.com/ymcui/cmrc2019

Resource Type

Primary Text

Media Type

Text

Language(s)

Chinese

Access Medium