Resource: GLiCom Spanish Wordform list - Regular word-forms + verb-clitic combinations

Reference GLiCom Spanish Wordform list - Regular word-forms + verb-clitic combinations
Date of Submission Nov. 2, 2015, 6:24 p.m.
Status accepted
ISLRN 529-126-116-826-1
Resource Type Lexicon
Media Type Text
Language Spanish, Castilian
Access Medium CD-ROM

GLiCom Spanish Wordform List v.1 is a computational lexicon of inflected wordforms in Spanish. Each entry has the following information: (i) lemma, (ii) morphosyntactic tag, and (iii) word type. This lexicon can be used in any application for Text Analysis in Spanish, in particular those in need for a lemmatizer, POS tagger, or Named Entity recogniser.
The lexicon is distributed in two sublexicons:
1- word forms
2- verb-clitic combinations

The list of wordforms contains 1,152,242 entries, including (i) regular words (1,144,086), (ii) toponyms and anthroponyms (8,032), (iii) abbreviations and acronyms (775), and (iv) computational terms (124). Each entry consists of: form, lemma, morphosyntactic tag and the word type.

The list of verb-clitic combinations contains 4,283,637 entries, exhaustively covering all formal combinations (including infinitive, gerund and imperative). Note that some clitic combinations may be formally possible although semantically implausible. Each entry consists of: form, lemma of the verb and combination of morphosyntactic tags of the verb and the pronoun(s).

Version 1.0
Distributor ELRA