Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.
Flexible Acquisition of Subcategorization Frames in Italian
Frontini Francesca;Quochi Valeria;Russo Irene
2012
Abstract
Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Caselli Tommaso | it |
| dc.authority.people | Frontini Francesca | it |
| dc.authority.people | Quochi Valeria | it |
| dc.authority.people | Rubino Francesco | it |
| dc.authority.people | Russo Irene | it |
| dc.authority.project | Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies | - |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 19:15:52 | - |
| dc.date.available | 2024/02/20 19:15:52 | - |
| dc.date.issued | 2012 | - |
| dc.description.abstracteng | Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "A. Zampolli", CNR, Italy | - |
| dc.description.allpeople | Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco; Russo, Irene | - |
| dc.description.allpeopleoriginal | Caselli, Tommaso; Frontini, Francesca; Quochi, Valeria; Rubino, Francesco and Russo, Irene | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 5 | - |
| dc.identifier.isbn | 9782951740877 | - |
| dc.identifier.isi | WOS:000323927702149 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/222834 | - |
| dc.identifier.url | http://www.lrec-conf.org/proceedings/lrec2012/summaries/390.html | - |
| dc.language.iso | eng | - |
| dc.publisher.country | FRA | - |
| dc.publisher.name | European Language Resources Association ELRA | - |
| dc.publisher.place | Paris | - |
| dc.relation.alleditors | Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet U?ur Do?an, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis | - |
| dc.relation.conferencedate | 23-25 Maggio 2012 | - |
| dc.relation.conferencename | Eight International Conference on Language Resources and Evaluation (LREC'12) | - |
| dc.relation.conferenceplace | Istanbul, Turkey | - |
| dc.relation.firstpage | 2842 | - |
| dc.relation.ispartofbook | Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) | - |
| dc.relation.lastpage | 2848 | - |
| dc.relation.numberofpages | 7 | - |
| dc.relation.projectAcronym | PANACEA | - |
| dc.relation.projectAwardNumber | 248064 | - |
| dc.relation.projectAwardTitle | Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies | - |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | FP7 | - |
| dc.subject.keywords | lexicon | - |
| dc.subject.keywords | automatic acquisition | - |
| dc.subject.keywords | subcategorisation frames | - |
| dc.subject.singlekeyword | lexicon | * |
| dc.subject.singlekeyword | automatic acquisition | * |
| dc.subject.singlekeyword | subcategorisation frames | * |
| dc.title | Flexible Acquisition of Subcategorization Frames in Italian | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 287038 | - |
| iris.isi.extIssued | 2012 | - |
| iris.isi.extTitle | Flexible Acquisition of Verb Subcategorization Frames in Italian | - |
| iris.orcid.lastModifiedDate | 2025/03/02 07:54:42 | * |
| iris.orcid.lastModifiedMillisecond | 1740898482836 | * |
| iris.sitodocente.maxattempts | 2 | - |
| isi.category | OT | * |
| isi.category | OY | * |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.name | Tommaso | - |
| isi.contributor.name | Francesco | - |
| isi.contributor.name | Francesca | - |
| isi.contributor.name | Irene | - |
| isi.contributor.name | Valeria | - |
| isi.contributor.researcherId | HDN-3839-2022 | - |
| isi.contributor.researcherId | - | |
| isi.contributor.researcherId | MDT-6613-2025 | - |
| isi.contributor.researcherId | AAX-7808-2020 | - |
| isi.contributor.researcherId | E-7468-2011 | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.surname | Caselli | - |
| isi.contributor.surname | Rubino | - |
| isi.contributor.surname | Frontini | - |
| isi.contributor.surname | Russo | - |
| isi.contributor.surname | Quochi | - |
| isi.date.issued | 2012 | * |
| isi.description.abstracteng | This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (domain environment). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). In addition to this, we assign to the extracted entries of the lexicon a confidence score and evaluate the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability. | * |
| isi.description.allpeopleoriginal | Caselli, T; Rubino, F; Frontini, F; Russo, I; Quochi, V; | * |
| isi.document.sourcetype | WOS.ISSHP | * |
| isi.document.type | Proceedings Paper | * |
| isi.document.types | Proceedings Paper | * |
| isi.identifier.isi | WOS:000323927702149 | * |
| isi.journal.journaltitle | LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | * |
| isi.language.original | English | * |
| isi.publisher.place | 55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE | * |
| isi.relation.firstpage | 2842 | * |
| isi.relation.lastpage | 2848 | * |
| isi.title | Flexible Acquisition of Verb Subcategorization Frames in Italian | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


