Chinese Phonetic (Pinyin) Error Dataset

Full Official Name: Chinese Phonetic (Pinyin) Error Dataset
Submission date: March 13, 2026, 7:57 p.m.

A corpus of Chinese sentences with programmatically introduced phonetic (invalid Pinyin) and semantic (valid Pinyin) errors, designed for evaluating Input Method Editor (IME) error correction capabilities. Derived from the CSCD-IME corpus.

Creator(s)
Distributor(s)
Right Holder(s)