Italian Syntactic-Semantic Treebank (ISST)

Full Official Name: Italian Syntactic-Semantic Treebank (ISST)
Submission date: Jan. 24, 2014, 4:29 p.m.

ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over two different levels: the constituent structure level and the functional relations level. The fifth level deals with lexico-semantic annotation, which is carried out in terms of sense tagging of lexical heads (nouns, verbs and adjectives) augmented with other types of semantic information: ItalWordNet (see ELRA-M0018) is the reference lexical resource used for the sense tagging task. Both syntactic and lexico-semantic annotations refer to the morpho-syntactically annotated text, which in turn is linked to the orthographic file with the text and mark-up of macrotextual organisation (e.g. titles, subtitles, summary, body of article, paragraphs). The multi-level structure of ISST shows two main novelties with respect to other treebanks: 1) while most treebanks are restricted to syntactic annotation only, ISST includes both syntactic and semantic annotation levels. In this way, the prerequisites are set up for corpus-based investigations on the syntax-semantics interface: the linking of the syntactic and semantic annotation layers permits, for instance, the identification of specific subcategorisation properties associated with a specific word sense, or of the semantic types associated with the functional positions of a given predicate; 2) the other innovative aspect of ISST concerns the distributed approach to syntactic annotation. In this respect, ISST differs from most treebanks which adopt a unique syntactic representation layer. ISST also differs from multi-level treebanks like the Prague Dependency Treebank (PTD): whereas PTD annotation levels refer respectively to a) the surface dependency relations and b) the underlying sentence structure, ISST syntactic annotation levels are intended to provide orthogonal views of the same surface syntax. The adopted morpho-syntactic annotation scheme conforms to the EAGLES international standard. ISST constituency annotation departs from other constituency-based syntactic annotation schemes (e.g. the one adopted in the Penn Treebank) in a number of respects, mainly due to the distributed organisation of syntactic annotation: annotation at this level consists in the identification of phrase boundaries with labelling of constituent types; due to the fact that functional relations are handled at a distinct level, ISST tree structures are shallow. The ISST functional annotation scheme is based on FAME (Lenci et al. 1999, 2000) whose main features can be summarised as follows: a) hierarchical organisation of functional relations which makes provision for underspecified representations of highly ambiguous functional analyses; b) modular coding architecture which is articulated over different information layers, each factoring out different but possibly interrelated linguistic facets of syntactic annotation. FAME originated as a revision of a de facto standard, i.e. the functional annotation scheme developed in the framework of the LE-2111 SPARKLE project, revision which was first done for better complying with the basic requirements of parsing evaluation (in the framework of the LE-8340 ELSE project), and then for making the scheme suitable for annotation of unrestricted Italian texts. References: Lenci A., Montemagni S., Pirrelli V., Soria C., FAME: a Functional Annotation Meta-scheme for Multimodal and Multi-lingual Parsing Evaluation, in Proceedings of the ACL99 Workshop on Computer-Mediated Language Assessment and Evaluation in NLP, University of Maryland, June 22nd 1999. Lenci A., Montemagni S., Pirrelli V., Soria C., Where opposites meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation, in Proceedings of LREC-2000, 31/5-2/6 2000, Athens, 625-632. Articles describing ISST: Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci, Antonio Zampolli, Francesca Fanciulli, Maria Massetani, Remo Raffaelli, Roberto Basili, Maria Teresa Pazienza, Dario Saracino, Fabio Zanzotto, Nadia Mana, Fabio Pianesi, Rodolfo Delmonte, “Building the Italian Syntactic-Semantic Treebank”, in Anne Abeillé (ed.), Building and using Parsed Corpora, Language and Speech series, Kluwer, Dordrecht, pp. 189-210. Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci, Vito Pirrelli, Antonio Zampolli, Francesca Fanciulli, Maria Massetani, Remo Raffaelli, Roberto Basili, Maria Teresa Pazienza, Dario Saracino, Fabio Zanzotto, Nadia Mana, Fabio Pianesi, Rodolfo Delmonte, 2003, “The syntactic-semantic treebank of Italian. An overview”, Linguistica Computazionale XVI-XVII, pp. 461-492

Creator(s)
Distributor(s)
Right Holder(s)