Resource: ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1

Reference ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1
Date of Submission Dec. 21, 2018, 2:43 p.m.
Status accepted
ISLRN 036-939-425-010-1
Resource Type Primary Text
Media Type Text
Source
Language English, Spanish
Format/MIME Type XML
Access Medium Downloadable
Description

The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaume I (Spain), is a collection of XML metatextually tagged corpora containing speeches from three European chambers (the European Parliament, the British House of Commons, and the Spanish Congreso de los Diputados). It is a bilingual, bidirectional written corpus in English and Spanish described by Zanettin (2012). This first set (ECPC_EP-05) consists of (1) a "clean" version in XML of European Parliament's 2005 daily sessions; (2) a POS-tagged version of the 2005 daily sessions; and (3) a sentence-based aligned version of 2005 daily sessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words (excluding tagging) in English distributed over 60 utf-8 files and 3,993,867 tokens/words (excluding tagging) in Spanish distributed over 60 utf-8 files.

Version 1.0
Distributor ELRA