Distance metrics between statistical distributions are widely used as an efficient mean to aggregate/simplify the underlying probabilities, thus enabling high-level analyses. In this paper we investigate the collisions that can arise with such metrics, and a mitigation technique rooted on kernels. In detail, we first show that the existence of colliding functions (so-called iso-curves) is widespread across metrics and families of functions (e.g., gaussians, heavy-tailed). Later, we propose a solution based on kernels for augmenting distance metrics and summary statistics, thus avoiding collisions and highlighting semantically-relevant phenomena. This study is supported by a thorough theoretical evaluation of our solution against a large number of functions and metrics, complemented by a real-world evaluation carried out by applying our solution to an existing problem. Some further research venues are also discussed. The theoretical construction and the achieved results show the soundness, viability, and quality of our proposal that, other being interesting on its own, also paves the way for further research in the highlighted directions.

Semantically-aware statistical metrics via weighting kernels

Cresci S;Tesconi M
2019

Abstract

Distance metrics between statistical distributions are widely used as an efficient mean to aggregate/simplify the underlying probabilities, thus enabling high-level analyses. In this paper we investigate the collisions that can arise with such metrics, and a mitigation technique rooted on kernels. In detail, we first show that the existence of colliding functions (so-called iso-curves) is widespread across metrics and families of functions (e.g., gaussians, heavy-tailed). Later, we propose a solution based on kernels for augmenting distance metrics and summary statistics, thus avoiding collisions and highlighting semantically-relevant phenomena. This study is supported by a thorough theoretical evaluation of our solution against a large number of functions and metrics, complemented by a real-world evaluation carried out by applying our solution to an existing problem. Some further research venues are also discussed. The theoretical construction and the achieved results show the soundness, viability, and quality of our proposal that, other being interesting on its own, also paves the way for further research in the highlighted directions.
2019
Istituto di informatica e telematica - IIT
online social networks security
social media analysis and mining
Statistics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/363462
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact