Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.

Flexible Acquisition of Subcategorization Frames in Italian

Frontini Francesca;Quochi Valeria;Russo Irene
2012

Abstract

Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Caselli Tommaso it
dc.authority.people Frontini Francesca it
dc.authority.people Quochi Valeria it
dc.authority.people Rubino Francesco it
dc.authority.people Russo Irene it
dc.authority.project Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies -
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 19:15:52 -
dc.date.available 2024/02/20 19:15:52 -
dc.date.issued 2012 -
dc.description.abstracteng Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy. -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli", CNR, Italy -
dc.description.allpeople Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco; Russo, Irene -
dc.description.allpeopleoriginal Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco and Russo, Irene -
dc.description.fulltext none en
dc.description.numberofauthors 5 -
dc.identifier.isbn 9782951740877 -
dc.identifier.isi WOS:000323927702149 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/222834 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2012/summaries/390.html -
dc.language.iso eng -
dc.publisher.country FRA -
dc.publisher.name European Language Resources Association ELRA -
dc.publisher.place Paris -
dc.relation.alleditors Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis -
dc.relation.conferencedate 23-25 Maggio 2012 -
dc.relation.conferencename Eight International Conference on Language Resources and Evaluation (LREC'12) -
dc.relation.conferenceplace Istanbul, Turkey -
dc.relation.firstpage 2842 -
dc.relation.ispartofbook Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) -
dc.relation.lastpage 2848 -
dc.relation.numberofpages 7 -
dc.relation.projectAcronym PANACEA -
dc.relation.projectAwardNumber 248064 -
dc.relation.projectAwardTitle Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies -
dc.relation.projectFunderName - en
dc.relation.projectFundingStream FP7 -
dc.subject.keywords lexicon -
dc.subject.keywords automatic acquisition -
dc.subject.keywords subcategorisation frames -
dc.subject.singlekeyword lexicon *
dc.subject.singlekeyword automatic acquisition *
dc.subject.singlekeyword subcategorisation frames *
dc.title Flexible Acquisition of Subcategorization Frames in Italian en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 287038 -
iris.isi.extIssued 2012 -
iris.isi.extTitle Flexible Acquisition of Verb Subcategorization Frames in Italian -
iris.orcid.lastModifiedDate 2025/03/02 07:54:42 *
iris.orcid.lastModifiedMillisecond 1740898482836 *
iris.sitodocente.maxattempts 2 -
isi.category OT *
isi.category OY *
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.name Tommaso -
isi.contributor.name Francesco -
isi.contributor.name Francesca -
isi.contributor.name Irene -
isi.contributor.name Valeria -
isi.contributor.researcherId HDN-3839-2022 -
isi.contributor.researcherId -
isi.contributor.researcherId MDT-6613-2025 -
isi.contributor.researcherId AAX-7808-2020 -
isi.contributor.researcherId E-7468-2011 -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.surname Caselli -
isi.contributor.surname Rubino -
isi.contributor.surname Frontini -
isi.contributor.surname Russo -
isi.contributor.surname Quochi -
isi.date.issued 2012 *
isi.description.abstracteng This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (domain environment). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). In addition to this, we assign to the extracted entries of the lexicon a confidence score and evaluate the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability. *
isi.description.allpeopleoriginal Caselli, T; Rubino, F; Frontini, F; Russo, I; Quochi, V; *
isi.document.sourcetype WOS.ISSHP *
isi.document.type Proceedings Paper *
isi.document.types Proceedings Paper *
isi.identifier.isi WOS:000323927702149 *
isi.journal.journaltitle LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION *
isi.language.original English *
isi.publisher.place 55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE *
isi.relation.firstpage 2842 *
isi.relation.lastpage 2848 *
isi.title Flexible Acquisition of Verb Subcategorization Frames in Italian *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/222834
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact