CNR Institutional Research Information System

The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.

Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser

Peters C;Federici S;Montemagni S;Calzolari N

1993

Abstract

The first version of the Pisa syntactic-semantic parser was described in detail in Deliverable 4, Section 2 and Appendices 2,3, and 4. The scope of this report is to discuss the testing of the parser on the sample set of vocabulary which has been selected from the ITU Corpus (see Deliverable 6.1) and to illustrate the revisions and extensions that are now being implemented. The report therefore concentrates on presenting analysis and extraction activities. We need to specify clearly all the kinds of information that we can extract from the Cobuild definitions before completing the description of the type system that will be used to represent them (to appear in Deliverable 7). Our parser takes as input the syntactically parsed definitions from Birmingham (referred to as the Birmingham input from now on) and analyses them, using complex pattern matching techniques, in order to derive and extract syntactic and semantic information. While the testing of the first version has confirmed the validity of the core procedures, it is clear that a strategy based on string matching must be tested over a relatively large sample of data before we can identify all the potentially significant markers that permit us to extract meaningful information. This means that, at least in the early stages, each time we test the parser over new samples of definitions, we expect to have to add to the basic set of rules. This report must thus be considered a description of work in progress. When discussing the changes that are now being implemented, continual reference will be made to the description of the first version of the parser presented in Deliverable 4, and to the templates used to represent the information extracted from the definitions. Examples of the new revised templates are given in the Appendix.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				1993
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Semantic
Knowledge Representation Formalisms and Methods
			
	Appare nelle tipologie:
	
				09.01 Nota tecnica

File in questo prodotto:

File	Dimensione	Formato
prod_411855-doc_145008.pdf accesso aperto Descrizione: Processing the ITU vocabulary: revisions and adaptations to the Pisa syntactic-semantic parser Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/363132

Citazioni

ND

ND

ND

social impact