Corpus of Law, Academic, and News

Full Official Name: Corpus of Law, Academic, and News
Submission date: Oct. 19, 2020, 10:56 p.m.

*Introduction* Corpus of Law, Academic, and News consists of 400 Persian documents divided into three genres: legal, academic, and news. The legal section contains texts from official publications, including the civil penal code, the criminal penal code, and the constitution of the Islamic Republic of Iran. The academic sub-corpus is comprised of published academic abstracts in various disciplinary areas, such as Art and Humanities, Social Sciences, and Natural Sciences. The news sub-corpus was extracted from an archive of ten Iranian news outlets spanning the period 2010- 2020. *Data* The document and token counts are as follows: 48 legal documents, 88,170 tokens; 274 academic documents, 85,765 tokens; and 78 news documents, 101,055 tokens. Each document contains metadata in the file's header with information such as specific text type, dates and source, and also contains annotations marking title and body paragraphs. All documents are presented as UTF-8 encoded XML with internal DTDs.

Creator(s)
Distributor(s)
Right Holder(s)