|Date of Submission||Oct. 3, 2014, 4:37 p.m.|
|Language||Arabic, Bulgarian, Chinese, Danish, Dutch, English, Estonian, French, Georgian, German, Greek, Modern (1453-), Hebrew, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swahili, Swedish, Thai, Turkish|
|Size||611,000 (status 1 October 2014)|
|Access Medium||files for download, incl. software|
JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.).
The named entity resource file with the list of spelling variants is accompanied by Java-implemented demonstrator software that (a) allows to produce - for any input name - a list of known spelling variants, and that (b) analyses UTF8-encoded text files to find known entity mentions, returning the name variant found, the preferred display name for that entity, the unique name identifier for that name, the position of the entity name in the text, and its length in characters.
The names were mostly identified in real-life news articles through named entity recognition and name spellings were mostly automatically added to the main name spelling.
The list of names gets updated every day with newly found names and their variants.
|Creator||European Commission - Joint Research Centre (JRC)|
|Distributor||Ralf Steinberger - European Commission - Joint Research Centre (JRC)|
|Rights Holder||European Union (EU)|