Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.

Combining statistical techniques and lexico-syntactic patterns for semantic relations extraction from text

Emiliano Giovannetti
2008

Abstract

Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.
Campo DC Valore Lingua
dc.authority.people Emiliano Giovannetti it
dc.collection.id.s 33fc2b58-b895-438b-9d2a-2c5bc86a83a6 *
dc.collection.name 04.04 Presentazione/Comunicazione non pubblicata in atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/16 17:15:52 -
dc.date.available 2024/02/16 17:15:52 -
dc.date.issued 2008 -
dc.description.abstracteng Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms. -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli" - CNR -
dc.description.allpeople Giovannetti, Emiliano -
dc.description.allpeopleoriginal Emiliano Giovannetti -
dc.description.fulltext none en
dc.description.numberofauthors 1 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/244997 -
dc.identifier.url http://www.iet.unipi.it/dottinformazione/Workshop/Anno2008/English.html -
dc.language.iso eng -
dc.relation.conferencedate 14 novembre 2008 -
dc.relation.conferencename Advances in Computer Systems and Networks. Doctoral Workshop 2008 -
dc.relation.conferenceplace Pisa -
dc.relation.ispartofbook Advances in Computer Systems and Networks. Doctoral Workshop 2008. -
dc.subject.keywords lexico-syntactic patterns -
dc.subject.keywords semantic relations extraction from text -
dc.subject.keywords ontology learning -
dc.subject.singlekeyword lexico-syntactic patterns *
dc.subject.singlekeyword semantic relations extraction from text *
dc.subject.singlekeyword ontology learning *
dc.title Combining statistical techniques and lexico-syntactic patterns for semantic relations extraction from text en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.04 Presentazione/Comunicazione non pubblicata in atti di convegno it
dc.type.miur -2.0 -
dc.type.referee No -
dc.ugov.descaux1 282636 -
iris.orcid.lastModifiedDate 2024/04/04 10:38:20 *
iris.orcid.lastModifiedMillisecond 1712219900066 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.04 Presentazione/Comunicazione non pubblicata (convegno, evento, webinar...)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/244997
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact