Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction.

Sentence-based active learning strategies for information extraction

Esuli A;Sebastiani F
2010

Abstract

Given a classifier trained on relatively few training examples, active learning (AL) consists in ranking a set of unlabeled examples in terms of how informative they would be, if manually labeled, for retraining a (hopefully) better classifier. An important text learning task in which AL is potentially useful is information extraction (IE), namely, the task of identifying within a text the expressions that instantiate a given concept. We contend that, unlike in other text learning tasks, IE is unique in that it does not make sense to rank individual items (i.e., word occurrences) for annotation, and that the minimal unit of text that is presented to the annotator should be an entire sentence. In this paper we propose a range of active learning strategies for IE that are based on ranking individual sentences, and experimentally compare them on a standard dataset for named entity extraction.
2010
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Massimo Melucci, Stefano Mizzaro, Gabriela Pasi
1st Italian Information Retrieval Workshop
41
45
Sì, ma tipo non specificato
27-28 January 2010
Padova, IT
Learning (K.3.2)
Design Methodology
Information retrieval
Information extraction
Active learning
3
restricted
Esuli, A; Marcheggiani, D; Sebastiani, F
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_92119-doc_131780.pdf

solo utenti autorizzati

Descrizione: Sentence-based active learning strategies for information extraction
Tipologia: Versione Editoriale (PDF)
Dimensione 174.29 kB
Formato Adobe PDF
174.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/63121
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact