Business intelligence (BI) is the activity of extracting strategic information from big data. The benefits of this activity for enterprises span from the reduction of the operative costs due to a more sensible internal organization to a more productive and aware decision process. To be effective, BI relies heavily on the availability of a huge amount of (possibly high-quality) data. The steady decrease of costs for acquiring, storing and analyzing large knowledge bases has motivated big companies to invest in BI technologies. Until now, instead, SMEs (Small and Medium-sized Companies) are excluded from the benefits of BI because of their limited budget and resources. In this paper we show that a satisfactory BI activity is possible even in presence of a small budget. Our ultimate goal is not necessarily that of proposing novel solutions but providing the practitioners with a sort of hitchhiker's guide to a cost-effective web-based BI. In particular, we discuss how the Web can be used as a cheap yet reliable source of information where crawling, data cleaning and classification can be achieved using a limited amount of CPU, storage space and bandwidth.

Web Crawling and Processing with Limited Resources for Business Intelligence and Analytics Applications

Filippo Geraci
2017

Abstract

Business intelligence (BI) is the activity of extracting strategic information from big data. The benefits of this activity for enterprises span from the reduction of the operative costs due to a more sensible internal organization to a more productive and aware decision process. To be effective, BI relies heavily on the availability of a huge amount of (possibly high-quality) data. The steady decrease of costs for acquiring, storing and analyzing large knowledge bases has motivated big companies to invest in BI technologies. Until now, instead, SMEs (Small and Medium-sized Companies) are excluded from the benefits of BI because of their limited budget and resources. In this paper we show that a satisfactory BI activity is possible even in presence of a small budget. Our ultimate goal is not necessarily that of proposing novel solutions but providing the practitioners with a sort of hitchhiker's guide to a cost-effective web-based BI. In particular, we discuss how the Web can be used as a cheap yet reliable source of information where crawling, data cleaning and classification can be achieved using a limited amount of CPU, storage space and bandwidth.
2017
Istituto di informatica e telematica - IIT
analytics
big data
Business Intelligence
Spam Detection
Web classification
web crawling
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/375651
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact