We discuss an approach to the automatic expansion of domain specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,.....,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents given as input. The method is inspired by text categorization (TC), the discipline con=cerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.

Expanding Domain-Specific Lexicons by Term Categorization

Avancini H;Sebastiani F;
2003

Abstract

We discuss an approach to the automatic expansion of domain specific lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the expansion of such lexicons as a process of learning previously unknown associations between terms and domains. The process generates, for each ci in a set C = {c1,.....,cm} of domains, a lexicon L1i, bootstrapping from an initial lexicon L0i and a set of documents given as input. The method is inspired by text categorization (TC), the discipline con=cerned with labelling natural language texts with labels from a predefined set of domains, or categories. However, while TC deals with documents represented as vectors in a space of terms, we formulate the task of term categorization as one in which terms are (dually) represented as vectors in a space of documents, and in which terms (instead of documents) are labelled with domains.
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
SAC-03, 18th ACM Symposium on Applied Computing
793
797
5
Sì, ma tipo non specificato
9-12 March 2003
Melbourne, US
Term classification
Classifier Design and Evaluation
Learning
Information Search and Retrieval
Thesauruses
Lavoro con più di 20 citazioni all'ultima valutazione.
5
restricted
Avancini, H; Lavelli, A; Magnini, B; Sebastiani, F; Zanoli, R
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_91003-doc_123424.pdf

solo utenti autorizzati

Descrizione: Expanding domain-specific lexicons by term categorization
Tipologia: Versione Editoriale (PDF)
Dimensione 128.41 kB
Formato Adobe PDF
128.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/56732
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact