Flexible Acquisition  of Subcategorization Frames in Italian

Tommaso, Caselli; Frontini, Francesca; Quochi, Valeria; Francesco, Rubino; Russo, Irene

Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.

Flexible Acquisition of Subcategorization Frames in Italian

Caselli Tommaso;Frontini Francesca;Quochi Valeria;Rubino Francesco;Russo Irene

2012

Abstract

Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Caselli Tommaso	it
dc.authority.people	Frontini Francesca	it
dc.authority.people	Quochi Valeria	it
dc.authority.people	Rubino Francesco	it
dc.authority.people	Russo Irene	it
dc.authority.project	Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies	-
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/20 19:15:52	-
dc.date.available	2024/02/20 19:15:52	-
dc.date.issued	2012	-
dc.description.abstracteng	Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "A. Zampolli", CNR, Italy	-
dc.description.allpeople	Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco; Russo, Irene	-
dc.description.allpeopleoriginal	Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco and Russo, Irene	-
dc.description.fulltext	none	en
dc.description.numberofauthors	5	-
dc.identifier.isbn	9782951740877	-
dc.identifier.isi	WOS:000323927702149	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/222834	-
dc.identifier.url	http://www.lrec-conf.org/proceedings/lrec2012/summaries/390.html	-
dc.language.iso	eng	-
dc.publisher.country	FRA	-
dc.publisher.name	European Language Resources Association ELRA	-
dc.publisher.place	Paris	-
dc.relation.alleditors	Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis	-
dc.relation.conferencedate	23-25 Maggio 2012	-
dc.relation.conferencename	Eight International Conference on Language Resources and Evaluation (LREC'12)	-
dc.relation.conferenceplace	Istanbul, Turkey	-
dc.relation.firstpage	2842	-
dc.relation.ispartofbook	Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)	-
dc.relation.lastpage	2848	-
dc.relation.numberofpages	7	-
dc.relation.projectAcronym	PANACEA	-
dc.relation.projectAwardNumber	248064	-
dc.relation.projectAwardTitle	Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies	-
dc.relation.projectFunderName	-	en
dc.relation.projectFundingStream	FP7	-
dc.subject.keywords	lexicon	-
dc.subject.keywords	automatic acquisition	-
dc.subject.keywords	subcategorisation frames	-
dc.subject.singlekeyword	lexicon	*
dc.subject.singlekeyword	automatic acquisition	*
dc.subject.singlekeyword	subcategorisation frames	*
dc.title	Flexible Acquisition of Subcategorization Frames in Italian	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	287038	-
iris.isi.extIssued	2012	-
iris.isi.extTitle	Flexible Acquisition of Verb Subcategorization Frames in Italian	-
iris.orcid.lastModifiedDate	2025/03/02 07:54:42	*
iris.orcid.lastModifiedMillisecond	1740898482836	*
iris.sitodocente.maxattempts	2	-
isi.category	OT	*
isi.category	OY	*
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.name	Tommaso	-
isi.contributor.name	Francesco	-
isi.contributor.name	Francesca	-
isi.contributor.name	Irene	-
isi.contributor.name	Valeria	-
isi.contributor.researcherId	HDN-3839-2022	-
isi.contributor.researcherId		-
isi.contributor.researcherId	MDT-6613-2025	-
isi.contributor.researcherId	AAX-7808-2020	-
isi.contributor.researcherId	E-7468-2011	-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation		-
isi.contributor.surname	Caselli	-
isi.contributor.surname	Rubino	-
isi.contributor.surname	Frontini	-
isi.contributor.surname	Russo	-
isi.contributor.surname	Quochi	-
isi.date.issued	2012	*
isi.description.abstracteng	This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (domain environment). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). In addition to this, we assign to the extracted entries of the lexicon a confidence score and evaluate the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability.	*
isi.description.allpeopleoriginal	Caselli, T; Rubino, F; Frontini, F; Russo, I; Quochi, V;	*
isi.document.sourcetype	WOS.ISSHP	*
isi.document.type	Proceedings Paper	*
isi.document.types	Proceedings Paper	*
isi.identifier.isi	WOS:000323927702149	*
isi.journal.journaltitle	LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	*
isi.language.original	English	*
isi.publisher.place	55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE	*
isi.relation.firstpage	2842	*
isi.relation.lastpage	2848	*
isi.title	Flexible Acquisition of Verb Subcategorization Frames in Italian	*
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/222834

Citazioni

ND

ND

0

CNR Institutional Research Information System

Flexible Acquisition of Subcategorization Frames in Italian

Caselli Tommaso;Frontini Francesca;Quochi Valeria;Rubino Francesco;Russo Irene

2012

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CNR Institutional Research Information System

Flexible Acquisition of Subcategorization Frames in Italian

Caselli Tommaso;Frontini Francesca;Quochi Valeria;Rubino Francesco;Russo Irene

2012

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)