Deep Web pages convey very relevant information for different application domains like e-government, e-commerce, social networking. For this reason there is a constant high interest in efficiently, effectively and automatically extracting data from Deep Web data sources. In this paper we present SILA, a novel Spatial Instance Learning Approach, that allows for extracting data records from Deep Web pages by exploiting both the spatial arrangement and the presentation features of data items/fields produced by layout engines of Web browsers in visualizing Deep Web pages on the screen. SILA is independent from the internal HTML encodings of Web pages, and allows for recognizing data records in pages having multiple data regions in which data items are arranged by many different presentation layouts. Experimental results show that SILA has very high precision and recall and that it works much better than MDR and ViNTs approaches. © 2011 ACM.

SILA: A spatial instance learning approach for deep webpages

Oro E;Ruffolo M
2011

Abstract

Deep Web pages convey very relevant information for different application domains like e-government, e-commerce, social networking. For this reason there is a constant high interest in efficiently, effectively and automatically extracting data from Deep Web data sources. In this paper we present SILA, a novel Spatial Instance Learning Approach, that allows for extracting data records from Deep Web pages by exploiting both the spatial arrangement and the presentation features of data items/fields produced by layout engines of Web browsers in visualizing Deep Web pages on the screen. SILA is independent from the internal HTML encodings of Web pages, and allows for recognizing data records in pages having multiple data regions in which data items are arranged by many different presentation layouts. Experimental results show that SILA has very high precision and recall and that it works much better than MDR and ViNTs approaches. © 2011 ACM.
2011
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Inglese
CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management
CIKM
2329
2332
978-1-4503-0717-8
http://www.scopus.com/inward/record.url?eid=2-s2.0-83055161475&partnerID=q2rCbXpz
Sì, ma tipo non specificato
2011
deep web
instance learning
web information extraction
web wrapping
2
none
Oro E.; Ruffolo M.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/253596
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact