This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated.
Joint modeling of arrival process and length distribution of queries in Web search engines
Cassara' P.;Colucci M.;Gotta A.;Tonellotto N.
2016
Abstract
This paper proposes a novel fitting procedure via non-parametric kernel- based models of the probability mass function of a discrete arrival process, derived from real traffic traces of queries to a Web search engine. Most of the adopted estimation techniques for probability mass functions are based on parameter estimations for a given family of probability distri- bution functions. Conversely, the proposed procedure, jointly with a kernel-based model of the probability distribution function, doesn't need any assumptions about membership to a families of distributions, or about parameters. The fitting procedure based on the Generalized Cross-Entropy resolves a Quadratic Programming Problem. Furthermore, the estimated probability mass function can be expressed in a closed form, as a weighted sum of kernel functions. We also examine the performance of the proposed procedure via numer- ical experiments and present an example of traffic analysis with real data traffic. Results show that our estimation of the probability mass function, closely matches the empirical probability mass function. Precisely, through the procedure, both temporal and statistical characteristics, such as auto- correlation, long-range dependence, and skewness, can be well approximated.File | Dimensione | Formato | |
---|---|---|---|
prod_357882-doc_116941.pdf
solo utenti autorizzati
Descrizione: Joint modeling of arrival process and length distribution of queries in Web search engines
Dimensione
720.35 kB
Formato
Adobe PDF
|
720.35 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.