ISLRN

JRC-Names

Full Official Name: JRC-Names

Submission date: Oct. 3, 2014, 4:37 p.m.

JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). The named entity resource file with the list of spelling variants is accompanied by Java-implemented demonstrator software that (a) allows to produce - for any input name - a list of known spelling variants, and that (b) analyses UTF8-encoded text files to find known entity mentions, returning the name variant found, the preferred display name for that entity, the unique name identifier for that name, the position of the entity name in the text, and its length in characters. The names were mostly identified in real-life news articles through named entity recognition and name spellings were mostly automatically added to the main name spelling. The list of names gets updated every day with newly found names and their variants.

Creator(s)

European Commission - Joint Research Centre (JRC)

Distributor(s)

European Commission - Joint Research Centre (JRC) - Ralf Steinberger

Right Holder(s)

European Union (EU)

Status : Accepted

ISLRN :

328-863-023-410-2

Version

1.0

Source

https://ec.europa.eu/jrc/en/language-technologies/jrc-names

Resource Type

Lexicon

Media Type

Text

Language(s)

Arabic

Bulgarian

Chinese

Danish

Dutch

English

Estonian

French

Georgian

German

Hebrew

Hindi

Italian

Japanese

Korean

Modern

Norwegian

Polish

Portuguese

Romanian

Russian

Slovenian

Spanish

Swahili

Swedish

Thai

Turkish

Access Medium

Files For Download, Incl. Software