The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.

Integration of Deep Web Sources: A Distributed Information Retrieval Approach

Straccia U
2017

Abstract

The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.
2017
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
7th International Conference on Web Intelligence, Mining and Semantics (WIMS-17)
33:1
33:4
4
978-1-4503-5225-3
http://doi.acm.org/10.1145/3102254.3102291
Sì, ma tipo non specificato
19-22 June 2017
Deep web
information integration
1
partially_open
Calì A.; Straccia U.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_374719-doc_125855.pdf

solo utenti autorizzati

Descrizione: wims17
Tipologia: Versione Editoriale (PDF)
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_374719-doc_147626.pdf

accesso aperto

Descrizione: Integration of Deep Web Sources: A Distributed Information Retrieval Approach
Tipologia: Versione Editoriale (PDF)
Dimensione 826.56 kB
Formato Adobe PDF
826.56 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/342715
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact