CNR Institutional Research Information System

Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.

Combining statistical techniques and lexico-syntactic patterns for semantic relations extraction from text

Emiliano Giovannetti

2008

Abstract

Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.people	Emiliano Giovannetti	it
dc.collection.id.s	33fc2b58-b895-438b-9d2a-2c5bc86a83a6	*
dc.collection.name	04.04 Presentazione/Comunicazione non pubblicata in atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/16 17:15:52	-
dc.date.available	2024/02/16 17:15:52	-
dc.date.issued	2008	-
dc.description.abstracteng	Semantic relation extraction is a crucial task for Ontology Learning from Texts. In literature, statistical unsupervised systems are used for semantic relation extraction: these systems typically detect pairs of semantically related terms (on the basis of their distribution in texts) without specifying the semantic relation holding between them. In this work we propose a fully unsupervised approach for semantic relation validation and extraction from texts. A statistical component (CLASS, CLustering through Analogy-based Semantic Similarity) is used to obtain a set of pairs of distributionally-similar terms occurring in similar contexts, and possibly involved in paradigmatic relations (as, for instance, the words "car" and "motorcycle" in the sentences "I drive my car" and "Bob drives his motorcycle"). To validate and label the anonymous relations obtained through the statistical module occurrences of the candidate pairs of terms are looked for in the Web in the context of reliable lexico-syntactic patterns, where they are involved in a syntagmatic relation (such as, for example, the words "steer" and "car" in the sentence "steer is part of the car"). This work focuses on the definition and application of the lexico-syntactic patterns and on the measures used to assess the reliability of the specific semantic relation the system suggests. The chosen semantic relations are hyponymy, meronymy, co-hyponymy and co-meronymy, for the relevance they have in ontology construction. Different lexico-syntactic patterns are used for different kinds of relations. In particular, patterns including both terms are used for hyponymy and meronymy discovery, (e.g. "cyclosporine is a medicine"): the number of occurrences of the pattern on the Web will indicate the confidence of the candidate semantic relation. Concerning co-hyponymy and co-meronymy, xplorative open patterns, including just one term, are used. For example, given the term pair "electron-nucleus", we can see if a co-meronymy relation holds between them by applying the following two patterns: "electron is part of" and "nucleus is part of" and then by looking for common holonyms (e.g. "atom"). Concerning evaluation, two different measures have been defined, one for hypernymy and meronymy relations and the other for co-hyponymy and comeronymy. The measures are basically built upon the number of occurrences of the patterns on the Web and, concerning co-hyponymy and co-meronymy, on the number of common hypernyms (or holonyms) shared between the terms.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "A. Zampolli" - CNR	-
dc.description.allpeople	Giovannetti, Emiliano	-
dc.description.allpeopleoriginal	Emiliano Giovannetti	-
dc.description.fulltext	none	en
dc.description.numberofauthors	1	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/244997	-
dc.identifier.url	http://www.iet.unipi.it/dottinformazione/Workshop/Anno2008/English.html	-
dc.language.iso	eng	-
dc.relation.conferencedate	14 novembre 2008	-
dc.relation.conferencename	Advances in Computer Systems and Networks. Doctoral Workshop 2008	-
dc.relation.conferenceplace	Pisa	-
dc.relation.ispartofbook	Advances in Computer Systems and Networks. Doctoral Workshop 2008.	-
dc.subject.keywords	lexico-syntactic patterns	-
dc.subject.keywords	semantic relations extraction from text	-
dc.subject.keywords	ontology learning	-
dc.subject.singlekeyword	lexico-syntactic patterns	*
dc.subject.singlekeyword	semantic relations extraction from text	*
dc.subject.singlekeyword	ontology learning	*
dc.title	Combining statistical techniques and lexico-syntactic patterns for semantic relations extraction from text	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.04 Presentazione/Comunicazione non pubblicata in atti di convegno	it
dc.type.miur	-2.0	-
dc.type.referee	No	-
dc.ugov.descaux1	282636	-
iris.orcid.lastModifiedDate	2024/04/04 10:38:20	*
iris.orcid.lastModifiedMillisecond	1712219900066	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	04.04 Presentazione/Comunicazione non pubblicata (convegno, evento, webinar...)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/244997

Citazioni

ND

ND

ND

social impact