Parked domains (PDs) are domains whose owners are not interested in using them as gates for their activities but they are kept reserved to be sold in the secondary market of web domains. To transform the costs of the annual registration fees in an opportunity of revenues, parked domains most often host a large amount of ads in the hope that someone who lands on the site by chance clicks on some ads. Since parking has become a widespread activity, a large number of specialized companies have come out and made parking a straightforward task that simply requires to set the domain's name servers appropriately. Although parking is a legal activity, it introduces a big burden for crawling systems and web mining tools. In fact, without filtering parked domains, crawlers could spend a non-negligible part of their time downloading fat web sites whose content can negatively affect the performances of analysis algorithms. In this paper, we face the problem of compiling the list of the name servers used for domain parking so that they can be discarded before the first connection just after the first DNS query.

A Clustering-based Approach for the Identification of Parked Domains

Giuseppe Cavaleri;Filippo Geraci
2014

Abstract

Parked domains (PDs) are domains whose owners are not interested in using them as gates for their activities but they are kept reserved to be sold in the secondary market of web domains. To transform the costs of the annual registration fees in an opportunity of revenues, parked domains most often host a large amount of ads in the hope that someone who lands on the site by chance clicks on some ads. Since parking has become a widespread activity, a large number of specialized companies have come out and made parking a straightforward task that simply requires to set the domain's name servers appropriately. Although parking is a legal activity, it introduces a big burden for crawling systems and web mining tools. In fact, without filtering parked domains, crawlers could spend a non-negligible part of their time downloading fat web sites whose content can negatively affect the performances of analysis algorithms. In this paper, we face the problem of compiling the list of the name servers used for domain parking so that they can be discarded before the first connection just after the first DNS query.
2014
Istituto di informatica e telematica - IIT
Parked domains
Web spam
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/264317
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact