Parked domains (PDs) are domains whose owners are not interested in using them as gates for their activities but they are kept reserved to be sold in the secondary market of web domains. To transform the costs of the annual registration fees in an opportunity of revenues, parked domains most often host a large amount of ads in the hope that someone who lands on the site by chance clicks on some ads. Since parking has become a widespread activity, a large number of specialized companies have come out and made parking a straightforward task that simply requires to set the domain's name servers appropriately. Although parking is a legal activity, it introduces a big burden for crawling systems and web mining tools. In fact, without filtering parked domains, crawlers could spend a non-negligible part of their time downloading fat web sites whose content can negatively affect the performances of analysis algorithms. In this paper, we face the problem of compiling the list of the name servers used for domain parking so that they can be discarded before the first connection just after the first DNS query.
A Clustering-based Approach for the Identification of Parked Domains
Giuseppe Cavaleri;Filippo Geraci
2014
Abstract
Parked domains (PDs) are domains whose owners are not interested in using them as gates for their activities but they are kept reserved to be sold in the secondary market of web domains. To transform the costs of the annual registration fees in an opportunity of revenues, parked domains most often host a large amount of ads in the hope that someone who lands on the site by chance clicks on some ads. Since parking has become a widespread activity, a large number of specialized companies have come out and made parking a straightforward task that simply requires to set the domain's name servers appropriately. Although parking is a legal activity, it introduces a big burden for crawling systems and web mining tools. In fact, without filtering parked domains, crawlers could spend a non-negligible part of their time downloading fat web sites whose content can negatively affect the performances of analysis algorithms. In this paper, we face the problem of compiling the list of the name servers used for domain parking so that they can be discarded before the first connection just after the first DNS query.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.