Deep Web pages convey very relevant information for different application domains like e-government, e-commerce, social networking. For this reason there is a constant high interest in efficiently, effectively and automatically extracting data from Deep Web data sources. In this paper we present SILA, a novel Spatial Instance Learning Approach, that allows for extracting data records from Deep Web pages by exploiting both the spatial arrangement and the presentation features of data items/fields produced by layout engines of Web browsers in visualizing Deep Web pages on the screen. SILA is independent from the internal HTML encodings of Web pages, and allows for recognizing data records in pages having multiple data regions in which data items are arranged by many different presentation layouts. Experimental results show that SILA has very high precision and recall and that it works much better than MDR and ViNTs approaches. © 2011 ACM.

SILA: A spatial instance learning approach for deep webpages

Oro E;Ruffolo M
2011

Abstract

Deep Web pages convey very relevant information for different application domains like e-government, e-commerce, social networking. For this reason there is a constant high interest in efficiently, effectively and automatically extracting data from Deep Web data sources. In this paper we present SILA, a novel Spatial Instance Learning Approach, that allows for extracting data records from Deep Web pages by exploiting both the spatial arrangement and the presentation features of data items/fields produced by layout engines of Web browsers in visualizing Deep Web pages on the screen. SILA is independent from the internal HTML encodings of Web pages, and allows for recognizing data records in pages having multiple data regions in which data items are arranged by many different presentation layouts. Experimental results show that SILA has very high precision and recall and that it works much better than MDR and ViNTs approaches. © 2011 ACM.
2011
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-1-4503-0717-8
deep web
instance learning
web information extraction
web wrapping
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/253596
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact