Resource: BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
|Reference||BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech|
|Date of Submission||Sept. 17, 2020, 8:13 p.m.|
|Resource Type||Primary Text|
|Access Medium||Web Download|
BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank and verb sense disambiguation annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CTS) data.
The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.
DF data was collected from the web using a combination of manual and automatic processes. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Arabic and Chinese CALLHOME and CALLFRIEND telephone collections; the audio files were transcribed and translated into English.
Propbank annotation and verb sense disambiguation were applied to BOLT phrase structure treebank annotation, specifically, to each predicate verb in a tree. Propbank annotation provided a layer of semantic annotation over treebank and was performed on all three genres. DF and SMS/Chat data was also annotated for verb sense disambiguation using Verbnet 3.2 classes.
Annotation files are presented as UTF-8 encoded and are in either plain text or XML formats.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
|Creator||Martha Palmer , Tim O'Gorman , Claire Bonial , Jena D. Hwang , James Gung , Kevin Stowe , Meredith Green|
|Distributor||Linguistic Data Consortium|
|Rights Holder||Portions © 1996, 1997, 2011-2020 Trustees of the University of Pennsylvania|