MCL - Multifunctional Computational Lexicon of Contemporary Portuguese

MCL is a 26,443 lemma Frequency Lexicon with 140,315 tokens, with the minimum lemma frequency of 6, extracted from CORLEX, a contemporary Portuguese corpus (16,210,438 words). CORLEX is a subcorpus of the Reference Corpus of Contemporary Portuguese and contains written and spoken texts of several types, being genre diversity a characteristic of this corpus. CORLEX contains mainly journalistic texts (56% of the written corpus and 53% of the whole corpus). In order to extract the lexicon, all the different lexical forms occurring in the corpus were indexed and subsequently tagged morphosyntactically and lemmatised by PALAVROSO. Each lemma in MCL is followed by morphosyntactic and quantitative information. The same information is given regarding each lemma token (inflected forms and some compounds). The lexicon indexations are listed in alphabetical order or decreasing frequency order.

