The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.
Integration of Deep Web Sources: A Distributed Information Retrieval Approach
Straccia U
2017
Abstract
The Deep Web consists of those structured data that are available as dynamically generated pages, typically requested through HTML forms. Deep Web pages cannot be indexed by search engines, and are notoriously difficult to query and integrate due to the limited access that they offer. We propose a novel framework for integrating Deep Web sources by means of a mediated schema that represent the underlying, distributed sources. Our goal is to compute answers to queries posed on the mediated schema. To this aim, we propose the use of techniques from the area of Distributed Information Retrieval. We discuss a novel approach to automated sampling, size estimation and selection of Deep Web sources, as well as a technique for merging result lists.File | Dimensione | Formato | |
---|---|---|---|
prod_374719-doc_125855.pdf
solo utenti autorizzati
Descrizione: wims17
Tipologia:
Versione Editoriale (PDF)
Dimensione
1.17 MB
Formato
Adobe PDF
|
1.17 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
prod_374719-doc_147626.pdf
accesso aperto
Descrizione: Integration of Deep Web Sources: A Distributed Information Retrieval Approach
Tipologia:
Versione Editoriale (PDF)
Dimensione
826.56 kB
Formato
Adobe PDF
|
826.56 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.