The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.

Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser

Peters C;Montemagni S;
1993

Abstract

The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI -
dc.authority.people Peters C it
dc.authority.people Federici S it
dc.authority.people Montemagni S it
dc.authority.people Calzolari N it
dc.collection.id.s d0cf8945-15df-4d4b-8de3-757cc3fb7178 *
dc.collection.name 09.01 Nota tecnica *
dc.contributor.appartenenza Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.appartenenza.mi 973 *
dc.date.accessioned 2024/02/16 21:36:43 -
dc.date.available 2024/02/16 21:36:43 -
dc.date.issued 1993 -
dc.description.abstracteng The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix. -
dc.description.affiliations CNR-IEI, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy -
dc.description.allpeople Peters, C; Federici, S; Montemagni, S; Calzolari, N -
dc.description.allpeopleoriginal Peters C.; Federici S.; Montemagni S.; Calzolari N. -
dc.description.fulltext open en
dc.description.note Rapporto interno - Codice PuMa: cnr.iei/1993-B4-036 -
dc.description.numberofauthors 4 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/363132 -
dc.language.iso eng -
dc.relation.firstpage 1 -
dc.relation.lastpage 17 -
dc.relation.numberofpages 19 -
dc.subject.keywords Semantic -
dc.subject.keywords Knowledge Representation Formalisms and Methods -
dc.subject.singlekeyword Semantic *
dc.subject.singlekeyword Knowledge Representation Formalisms and Methods *
dc.title Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser en
dc.type.driver info:eu-repo/semantics/other -
dc.type.full 09 Documentazione tecnica::09.01 Nota tecnica it
dc.type.miur -2.0 -
dc.ugov.descaux1 411855 -
iris.mediafilter.data 2025/04/18 03:14:30 *
iris.orcid.lastModifiedDate 2024/04/04 15:42:41 *
iris.orcid.lastModifiedMillisecond 1712238161834 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 09.01 Nota tecnica
File in questo prodotto:
File Dimensione Formato  
prod_411855-doc_145008.pdf

accesso aperto

Descrizione: Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser
Dimensione 1.85 MB
Formato Adobe PDF
1.85 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/363132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact