The development of methods for an effective and efficient access to the information contained in large masses of digital documents is a long-standing objective in computer science research, and its importance is emphasized by the growing availability of large information repositories. With the advent of the web, the methods for content delivery evolved in the services offered by search engines, categorization and topic search services, related pages services, etc.: the main innovation needed was a shift from content-only analysis methods to the combined analysis of contents and hyperlinked structure of web documents, as witnessed by the PageRank metric for document relevance. However, as the web explosion continues, the limitations of the current generation of access services to web contents are becoming clearer, in terms of scarce quality and freshness of the results, etc. The overall vision presented in this paper is the development of a new generation of services for enhanced content delivery - web search, document classification, question answering, etc. - tailored for a large-scale community of web users, and based on the use of knowledge extraction methods for enriching raw data with automatically-extracted semantic information. We refer to such category of services as Usage-enhanced Web-Access services (UWA), emphasizing the fact that they are based on a combination of web usage, web content and web structure mining. Usage data are those that the community of web users decides to share, on a privacy-preserving basis, in a participatory style. Usage-enhanced Web-Access services (UWA) applications are complex, for several reasons. They deal with enormous volumes of data. They deal with continuously incoming streams of data. They deal with different abstractions of the data. They apply computationally expensive data mining algorithms on the data. The needed infrastructure for supporting the development of UWA applications is called, in our project, Web Object Store - WOS - a web data management system specialized in dealing with web content, structure and usage data. The WOS is designed to provide persistency, compression and efficient access methods for data structures representing basic web objects (Web documents, URIs, Citations, and HTTP requests), and to help the development of sophisticated applications that need complex data structures and advanced analysis methods.
The web object store: an infrastructure for mining semantics from web resources and their usage
Nanni M;Silvestri F;Giannotti F;Pedreschi D
2005
Abstract
The development of methods for an effective and efficient access to the information contained in large masses of digital documents is a long-standing objective in computer science research, and its importance is emphasized by the growing availability of large information repositories. With the advent of the web, the methods for content delivery evolved in the services offered by search engines, categorization and topic search services, related pages services, etc.: the main innovation needed was a shift from content-only analysis methods to the combined analysis of contents and hyperlinked structure of web documents, as witnessed by the PageRank metric for document relevance. However, as the web explosion continues, the limitations of the current generation of access services to web contents are becoming clearer, in terms of scarce quality and freshness of the results, etc. The overall vision presented in this paper is the development of a new generation of services for enhanced content delivery - web search, document classification, question answering, etc. - tailored for a large-scale community of web users, and based on the use of knowledge extraction methods for enriching raw data with automatically-extracted semantic information. We refer to such category of services as Usage-enhanced Web-Access services (UWA), emphasizing the fact that they are based on a combination of web usage, web content and web structure mining. Usage data are those that the community of web users decides to share, on a privacy-preserving basis, in a participatory style. Usage-enhanced Web-Access services (UWA) applications are complex, for several reasons. They deal with enormous volumes of data. They deal with continuously incoming streams of data. They deal with different abstractions of the data. They apply computationally expensive data mining algorithms on the data. The needed infrastructure for supporting the development of UWA applications is called, in our project, Web Object Store - WOS - a web data management system specialized in dealing with web content, structure and usage data. The WOS is designed to provide persistency, compression and efficient access methods for data structures representing basic web objects (Web documents, URIs, Citations, and HTTP requests), and to help the development of sophisticated applications that need complex data structures and advanced analysis methods.File | Dimensione | Formato | |
---|---|---|---|
prod_160293-doc_125984.pdf
accesso aperto
Descrizione: The Web Object Store: an infrastructure for mining semantics from web resources and their usage
Dimensione
411.38 kB
Formato
Adobe PDF
|
411.38 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.