Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work I propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of "distributionally similar" terms occurring in similar contexts, and possibly involved in "paradigmatic" relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his mo-torcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of "reliable" lexico-syntactic patterns, where they are involved in a "syntagmatic relation" (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, explorative "open" patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and co-meronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.

Semantic relation labelling in ontology learning from texts

Emiliano Giovannetti
2009

Abstract

Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work I propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of "distributionally similar" terms occurring in similar contexts, and possibly involved in "paradigmatic" relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his mo-torcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of "reliable" lexico-syntactic patterns, where they are involved in a "syntagmatic relation" (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, explorative "open" patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and co-meronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.
2009
semantic relation extraction
ontology learning from text
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/244995
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact