The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.

Integration of Deep Web Sources: A Distributed Information Retrieval Approach

Straccia U
2017

Abstract

The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.
2017
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-1-4503-5225-3
Deep web
information integration
File in questo prodotto:
File Dimensione Formato  
prod_374719-doc_125855.pdf

solo utenti autorizzati

Descrizione: wims17
Tipologia: Versione Editoriale (PDF)
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_374719-doc_147626.pdf

accesso aperto

Descrizione: Integration of Deep Web Sources: A Distributed Information Retrieval Approach
Tipologia: Versione Editoriale (PDF)
Dimensione 826.56 kB
Formato Adobe PDF
826.56 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/342715
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact