Resource: Ancient Chinese Corpus

Reference Ancient Chinese Corpus
Date of Submission Oct. 23, 2017, 3:59 p.m.
Status accepted
ISLRN 924-985-704-453-5
Resource Type Primary Text
Media Type Text
Language Literary Chinese
Format/MIME Type text/plain
Size 1584
Access Medium Web Download


Ancient Chinese Corpus was developed at Nanjing Normal University. It contains word-segmented and part-of-speech tagged text from Zuozhuan, an ancient Chinese work believed to date from the Warring States Period (475-221 BC). Zuozhuan is a commentary on the Chunqui, a history of the Chinese Spring and Autumn period (770-476 BC). This release is part of a continuing project to develop a large, part-of-speech tagged ancient Chinese corpus.


Ancient Chinese Corpus consists of 180,000 Chinese characters and 195,000 segment units (including words and punctuation). The part-of-speech tag set was developed by Nanjing Normal University and contains 17 tags.

This release contains two text files: 268 paragraphs and 10,560 lines. A line is one sentence; paragraphs are separated by one empty line. Each word is tagged with its part-of-speech and separated by a space.

The files are presented in UTF-8 plain text files using traditional Chinese script.

Version 1.0
Creator Xiaohe Chen , Bin Li , Minxuan Feng , Chao Xu , Runhua Xu , Min Shi , Lili Yu , Lei Xiao , Qingqing Wang
Distributor Linguistic Data Consortium
Rights Holder Portions © 2017 Xiaohe Chen, © 2017 Trustees of the University of Pennsylvania