Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.

Combining statistical techniques and lexico-syntactic patterns for semantic relations extraction from text

Emiliano Giovannetti
2008

Abstract

Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.
2008
lexico-syntactic patterns
semantic relations extraction from text
ontology learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/244997
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact