We discuss an approach to the automatic expansion of domain specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,.....,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents given as input. The method is inspired by text categorization (TC), the discipline con=cerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

Expanding Domain-Specific Lexicons by Term Categorization

Avancini H;Sebastiani F;
2003

Abstract

We discuss an approach to the automatic expansion of domain specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,.....,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents given as input. The method is inspired by text categorization (TC), the discipline con=cerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Term classification
Classifier Design and Evaluation
Learning
Information Search and Retrieval
Thesauruses
File in questo prodotto:
File Dimensione Formato  
prod_91003-doc_123424.pdf

solo utenti autorizzati

Descrizione: Expanding domain-specific lexicons by term categorization
Tipologia: Versione Editoriale (PDF)
Dimensione 128.41 kB
Formato Adobe PDF
128.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/56732
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact