The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.

Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser

Peters C;Montemagni S;
1993

Abstract

The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.
1993
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Semantic
Knowledge Representation Formalisms and Methods
File in questo prodotto:
File Dimensione Formato  
prod_411855-doc_145008.pdf

accesso aperto

Descrizione: Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser
Dimensione 1.85 MB
Formato Adobe PDF
1.85 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/363132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact