ISLRN

Spanish-English website parallel corpus (Processed)

Full Official Name: Spanish-English website parallel corpus (Processed)

Submission date: March 2, 2020, 5:25 p.m.

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 21,007 TUs. Period of crawling : 15/11/2016 - 23/01/2017 A strict validation process has been followed, which resulted in discarding: - TUs from crawled websites that do not comply to the PSI directive, - TUs with more than 99% of mispelled tokens, - TUs identified during the manual validation process and all the TUs from websites whose error rate in the sample extracted for manual validation is strictly above the following thresholds: 50% of TUs with language identification errors, 50% of TUs with alignment errors, 50% of TUs with tokenization errors, 20% of TUs identified as machine translated content, 50% of TUs with translation errors.

Creator(s)

Distributor(s)

ELRA

Right Holder(s)

Ministry of Foreign and European Affairs of the Slovak Republic

Ministerio de Hacienda y Administraciones Publicas

Direcao Regional de Cultura do Norte

The British Library

Ministerio de Hacienda y Administraciones Publicas

Ministerio dos Negocios Estrangeiros

Polizia di Stato, Ministero dell’Interno

Diputación de Granada

Gobierno de la Rioja

Diputación Foral de Bizkaia

Diputació de Castelló

Consell de Mallorca.net

Instituto Nacional de Estadística

Presidenza della Repubblica

Sligo county council

Agencia Tributaria, Gobierno de España

Instituto dos Vinhos do Douro e do Porto, I. P.

Consejería de Empleo, Empresa y Comercio, Junta de Andalucía

Gobierno de Aragón

Dirección General de Seguros y Fondos de Pensiones

Ministerio de Economía y Competitividad

Gobierno del Principado de Asturias

Diputació de València

Ministerio de Sanidad, Servicios Sociales e Igualdad

Junta General del Principado de Asturias

Agencia espanola de consumo, seguridad alimentaria y nutricion

Ministerio de Industria, Energía y Turismo

Gobierno de Aragon

German Foreign Ministry

Consejo Superior de Deportes, Ministerio de Educación, Cultura y Deporte

Consejeria de Empleo, Empresa y Comercio, Junta de Andalucia

Turismo de Portugal, I.P.

Asamblea de Madrid

Direccion General del Catastro

Ministero della Difesa

Agencia Estatal de Evaluacion de las Politicas Publicas y la Calidad de los Servicios

Director of Public Prosecutions

Ministerio de Educacion, Cultura y Deporte

Ministerio de Asuntos Exteriores y Cooperacion

Diputacio de Castello

Instituto Nacional de Estadistica

Status : Accepted

ISLRN :

664-503-904-200-9

Version

2.0

Source

http://catalog.elra.info/en-us/repository/browse/ELRA-W0248

Resource Type

Primary Text

Media Type

Text

Language(s)

English

Spanish

Access Medium

Downloadable