This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated.

Joint modeling of arrival process and length distribution of queries in Web search engines

Colucci M;Gotta A;Tonellotto N
2016

Abstract

This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated.
2016
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Web search engine
Batch arrival process
Kernel-based probability distribution models
Generalized cross entropy
Computer-communication networks
File in questo prodotto:
File Dimensione Formato  
prod_357882-doc_116941.pdf

solo utenti autorizzati

Descrizione: Joint modeling of arrival process and length distribution of queries in Web search engines
Dimensione 720.35 kB
Formato Adobe PDF
720.35 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/325589
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact