ISLRN

AIDA Scenario 2 Practice Topic Source Data

Full Official Name: AIDA Scenario 2 Practice Topic Source Data

Submission date: April 15, 2024, 11:10 p.m.

AIDA Scenario 2 Practice Topic Source Data was developed by the Linguistic Data Consortium (LDC) and is comprised of 1500 root documents, including text, image, and video, from English, Russian, and Spanish web sources. The DARPA AIDA (Active Interpretation of Disparate Alternatives) program aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations and trends from a variety of unstructured sources. LDC supported AIDA by collecting, creating and annotating multimodal linguistic resources in multiple languages. Each phase of the AIDA program centered on a specific scenario, or broad topic area, with related subtopics designated as either practice subtopics or evaluation subtopics. The Phase 2 scenario focused on the socioeconomic and political crisis in Venezuela since 2010. This corpus constitutes the full set of topic-focused documents for Phase 2 practice subtopics. Data was collected from web sources by a combination of automatic and manual processes. HTML content was converted from its original form into XML. To the extent possible, all resources referenced by a given "root" HTML page (style sheets, javascript, images, media files, etc.) were stored as separate files of the given data type and assigned separate 9-character file-IDs (the same form of ID used for the "root" HTML page). The knowledge base for entity detection and linking annotation for all AIDA Scenario 1 and 2 corpora is available separately as AIDA Scenario 1 and 2 Reference Knowledge Base (LDC2023T10).

Creator(s)

Jennifer Tracey

Stephanie Strassel

Jeremy Getman

Ann Bies

Kira Griffitt

David Graff

Christopher Caruso

Distributor(s)

Linguistic Data Consortium

Right Holder(s)

Portions © 2015 21st Century Wire, © 2020 ABC, © 2013 ABC News Internet Ventures, © 2014, 2017-2018 Alba Ciudad 96.3 FM, © 2017 AL DÍA NEWS Media, © 2017-2018 Al Jazeera Media Network, © 2018 AméricaEconomía, © 2019 American Association for the Advancement of Science, © 2019 Americas Society/Council of the Americas, © 2020 AMX Content SA de CV, © 2014, 2017 Arguments and Facts JSC, © 2014 ARMENPRESS, © 2018 Authorized by the Chief Agent, CPC, © 2014, 2017-2018 Autonomous Nonprofit Organization “TV-Novosti”, © 2013-2014, 2018-2019 BBC, © 2015, 2017-2018 Bellingcat, © 2019 Breitbart, © 2018 Business capital, © 2020 business/media bureau ekonomika,© 2019-2020 C.A. IBERONEWS LIMITED, © 2018-2020 C.A. The Universe, © 2013, 2017 Cable News Network. Turner Broadcasting System, Inc., © 2017 Caracas Chronicles, © 2018 Caracol SA, © 2018 CARACOL TELEVISIÓN SA, © 2013, 2017 CBC/Radio-Canada, © 2013 CBS Interactive Inc., © 2020 CDN, © 2017 Center for Democracy in the Americas, © 2014-2015 Channel

Status : Accepted

ISLRN :

484-106-854-383-0

Version

1.0

Source

https://catalog.ldc.upenn.edu/LDC2024T04

Resource Type

Primary Text

Media Type

Audio

Image

Text

Video

Language(s)

English

Russian

Spanish

Access Medium