Learner Corpus of Portuguese L2 – COPLE2

The Learner Corpus of Portuguese as Second/Foreign Language (COPLE2) is a corpus of written and oral texts produced by students of Portuguese as Foreign/Second Language courses in the Instituto de Cultura e Língua Portuguesa (the Institute of Portuguese Language and Culture) (ICLP – FLUL) and by applicants for examinations in the Centro de Avaliação de Português Língua Estrangeira (Center for Evaluation of Portuguese as a Foreign Language) (CAPLE – FLUL). The corpus contains texts from learners with 15 different native languages (L1s) and proficiencies from A1 to C1, and covers different topics and types of tasks. It is encoded in TEI format through the TEITOK environment. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. Each learner text is codified with complete metadata concerning the learner profile, the type of task and the circumstances where the text was produced. The corpus contains annotations for part of speech, lemma and learner errors. All the information encoded is searchable through the CQP query language. The corpus was funded by Fundação Calouste Gulbenkian, within the RECAP and the LeCIEPLE projects, by Associação para o Desenvolvimento da Faculdade de Letras da Universidade de Lisboa (ADFLUL), and Instituto de Cultura e Língua Portuguesa. It results from a partnership between several institutions such as ICLP, CAPLE and Centro de Linguística da Universidade de Lisboa (CLUL).

