The present paper describes LMF LExical MErger (L-LEME), an architecture to combine two lexicons in order to obtain new resource(s). L-LEME relies on standards, thus exploiting the benefits of the ISO Lexical Markup Framework (LMF) to ensure interoperability. L-LEME is meant to be dynamic and heavily adaptable: it allows the users to configure it to meet their specific needs. The L-LEME architecture is composed of two main modules: the Mapper, which takes in input two lexicons A and B and a set of user-defined rules and instructions to guide the mapping process (Directives D) and gives in output all matching entries. The algorithm also calculates a cosine similarity score. The Builder takes in input the previous results, a set of Directives D1 and produces a new LMF lexicon C. The Directives allow the user to define its own building rules and different merging scenarios. L-LEME is applied to a specific concrete task within the PANACEA project, namely the merging of two Italian SubCategorization Frame (SCF) lexicons. The experiment is interesting in that A and B have different philosophies behind, being A built by human introspection and B automatically extracted. Ultimately, L-LEME has interesting repercussions in many language technology applications
L-LEME: an Automatic Lexical Merger based on the LMF Standard
Riccardo Del Gratta;Francesca Frontini;Monica Monachini;Valeria Quochi;Matteo Abrate;Angelica Lo Duca
2012
Abstract
The present paper describes LMF LExical MErger (L-LEME), an architecture to combine two lexicons in order to obtain new resource(s). L-LEME relies on standards, thus exploiting the benefits of the ISO Lexical Markup Framework (LMF) to ensure interoperability. L-LEME is meant to be dynamic and heavily adaptable: it allows the users to configure it to meet their specific needs. The L-LEME architecture is composed of two main modules: the Mapper, which takes in input two lexicons A and B and a set of user-defined rules and instructions to guide the mapping process (Directives D) and gives in output all matching entries. The algorithm also calculates a cosine similarity score. The Builder takes in input the previous results, a set of Directives D1 and produces a new LMF lexicon C. The Directives allow the user to define its own building rules and different merging scenarios. L-LEME is applied to a specific concrete task within the PANACEA project, namely the merging of two Italian SubCategorization Frame (SCF) lexicons. The experiment is interesting in that A and B have different philosophies behind, being A built by human introspection and B automatically extracted. Ultimately, L-LEME has interesting repercussions in many language technology applications| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di informatica e telematica - IIT | - |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Riccardo Del Gratta | it |
| dc.authority.people | Francesca Frontini | it |
| dc.authority.people | Monica Monachini | it |
| dc.authority.people | Valeria Quochi | it |
| dc.authority.people | Francesco Rubino | it |
| dc.authority.people | Matteo Abrate | it |
| dc.authority.people | Angelica Lo Duca | it |
| dc.authority.project | Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies | - |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di informatica e telematica - IIT | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 912 | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/15 19:19:15 | - |
| dc.date.available | 2024/02/15 19:19:15 | - |
| dc.date.issued | 2012 | - |
| dc.description.abstracteng | The present paper describes LMF LExical MErger (L-LEME), an architecture to combine two lexicons in order to obtain new resource(s). L-LEME relies on standards, thus exploiting the benefits of the ISO Lexical Markup Framework (LMF) to ensure interoperability. L-LEME is meant to be dynamic and heavily adaptable: it allows the users to configure it to meet their specific needs. The L-LEME architecture is composed of two main modules: the Mapper, which takes in input two lexicons A and B and a set of user-defined rules and instructions to guide the mapping process (Directives D) and gives in output all matching entries. The algorithm also calculates a cosine similarity score. The Builder takes in input the previous results, a set of Directives D1 and produces a new LMF lexicon C. The Directives allow the user to define its own building rules and different merging scenarios. L-LEME is applied to a specific concrete task within the PANACEA project, namely the merging of two Italian SubCategorization Frame (SCF) lexicons. The experiment is interesting in that A and B have different philosophies behind, being A built by human introspection and B automatically extracted. Ultimately, L-LEME has interesting repercussions in many language technology applications | - |
| dc.description.affiliations | CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-IIT, Pisa, Italy; CNR-IIT, Pisa, Italy | - |
| dc.description.allpeople | DEL GRATTA, Riccardo; Frontini, Francesca; Monachini, Monica; Quochi, Valeria; Rubino, Francesco; Abrate, Matteo; LO DUCA, Angelica | - |
| dc.description.allpeopleoriginal | Riccardo Del Gratta, Francesca Frontini, Monica Monachini, Valeria Quochi, Francesco Rubino, Matteo Abrate, Angelica Lo Duca | - |
| dc.description.fulltext | none | en |
| dc.description.note | ID_PUMA; /cnr.iit/2012-A2-035 cnr.iit/2012-A2-020 | - |
| dc.description.numberofauthors | 7 | - |
| dc.identifier.isbn | 978-2-9517408-7-7 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/117790 | - |
| dc.language.iso | eng | - |
| dc.miur.last.status.update | 2024-10-02T13:14:55Z | * |
| dc.relation.alleditors | Bel N. , Gavrilidou M. , Monachini M., Quochi V., Rimell L. | - |
| dc.relation.conferencedate | 2012 | - |
| dc.relation.conferencename | The Eight International Conference on Language Resources and Evaluation (LREC) 2012 | - |
| dc.relation.conferenceplace | Istanbul, Turkey | - |
| dc.relation.firstpage | 31 | - |
| dc.relation.ispartofbook | Proceedings of the LREC 2012 Workshop on Language Resource Merging | - |
| dc.relation.lastpage | 40 | - |
| dc.relation.numberofpages | 10 | - |
| dc.relation.projectAcronym | PANACEA | - |
| dc.relation.projectAwardNumber | 248064 | - |
| dc.relation.projectAwardTitle | Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies | - |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | FP7 | - |
| dc.subject.keywords | LMF | - |
| dc.subject.keywords | Lexicon mapping | - |
| dc.subject.keywords | similarity score | - |
| dc.subject.singlekeyword | LMF | * |
| dc.subject.singlekeyword | Lexicon mapping | * |
| dc.subject.singlekeyword | similarity score | * |
| dc.title | L-LEME: an Automatic Lexical Merger based on the LMF Standard | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 223098 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 14:24:38 | * |
| iris.orcid.lastModifiedMillisecond | 1712233478051 | * |
| iris.sitodocente.maxattempts | 1 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


