deL1L2IM corpus

Full Official Name: deL1L2IM corpus
Submission date: March 20, 2015, 3:49 p.m.

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation of instant messaging dialogues on a long-term basis (four months) between advanced learners of German and German native speakers, chatting about whatever topic they wish. The dataset is composed of 72 dialogues, each of them having a duration of 20 to 45 minutes. The whole corpus contains ca. 52,000 words and 4,800 messages and has a file size of 0.5 Mb. Nine pairs of participants – i.e. nine learners and four native speakers – were required, with 8 dialogues per pair. The interactions have undergone linguistic analysis whereby the annotation will be performed only on repair/correction sequences (incomplete learner error annotation). The goal of the project was to create an application for language modelling and to improve learner language applications, tutoring software and dialogue systems. The corpus is delivered in one written text file (in XML format, customized under TEI P5).

Creator(s)
Distributor(s)
Right Holder(s)