Full Official Name: NPChunks
Submission date: Jan. 20, 2016, 11:58 a.m.

NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0. The corpus is PoS-annotated at token level, including punctuation. Noun Phrases were recognized and annotated with specific tags. It was automatically PoS-tagged with MBT tagger (, and lemmatized with MBLEM (, following the annotation scheme of the Corpus of Reference of Contemporary Portuguese. YamCha software ( was used to recognize chunks that consist of Noun Phrases and to identify the elements appearing at the beginning, in the middle and at the end of a noun phrase.

