Analyzing communities composition is central to many studies. Sampling an habitat at time intervals allows to investigate inter-taxa correlations.However the possibility to draw meaningful data on this is undermined by an inherent mathematical constraint; data from such counts are in percentages.Even when expressed as discrete units (e.g. number of sequence reads), they come from a fixed amount of DNA from that habitat. When comparing frequencies, since the total is always bound to yield 100%, the increase of one species can automatically cause the apparent decrease of all the others, irrespective of their actual dynamics. This can occur even when: a)they had no change in actual numbers; b) they also increased but at a slower rate compared to the fastest-growing one. By comparing data from time-zero to time-1 one could therefore erroneously view a negative correlation with the increasing taxon while reality could have involved neutrality or even positive correlations that would pass overlooked and even oppositely interpreted. Although the issue in the statistics field was warned against by Karl Pearson in 1897, the attitude of comparing frequency pie charts without considering this caveat is still a habit causing false conclusions in everyday reports. Some iterative approaches to infer correlation networks from frequencies exist but are rarely applied to published data. Hereby a novel method of plain data transformation in four steps is introduced, demonstrated with virtual community data and then applied to a real dataset from 32 soil microcosms in cropped pots sampled at two time intervals. Bacterial phyla dynamics and their correlations are analyzed, comparing the results of the new method with those from the naïve frequency variation analysis, and showing the effect of the correction procedure on the inferrable correlations. Reversions of correlations from negative to positive and changes in statistical significance are shown.

Assessing inter-taxa correlations in microbial communities sampled through time. How to avoid the percentage-based cognitive illusion pitfall

Rosella Muresu;
2017

Abstract

Analyzing communities composition is central to many studies. Sampling an habitat at time intervals allows to investigate inter-taxa correlations.However the possibility to draw meaningful data on this is undermined by an inherent mathematical constraint; data from such counts are in percentages.Even when expressed as discrete units (e.g. number of sequence reads), they come from a fixed amount of DNA from that habitat. When comparing frequencies, since the total is always bound to yield 100%, the increase of one species can automatically cause the apparent decrease of all the others, irrespective of their actual dynamics. This can occur even when: a)they had no change in actual numbers; b) they also increased but at a slower rate compared to the fastest-growing one. By comparing data from time-zero to time-1 one could therefore erroneously view a negative correlation with the increasing taxon while reality could have involved neutrality or even positive correlations that would pass overlooked and even oppositely interpreted. Although the issue in the statistics field was warned against by Karl Pearson in 1897, the attitude of comparing frequency pie charts without considering this caveat is still a habit causing false conclusions in everyday reports. Some iterative approaches to infer correlation networks from frequencies exist but are rarely applied to published data. Hereby a novel method of plain data transformation in four steps is introduced, demonstrated with virtual community data and then applied to a real dataset from 32 soil microcosms in cropped pots sampled at two time intervals. Bacterial phyla dynamics and their correlations are analyzed, comparing the results of the new method with those from the naïve frequency variation analysis, and showing the effect of the correction procedure on the inferrable correlations. Reversions of correlations from negative to positive and changes in statistical significance are shown.
2017
Istituto per il Sistema Produzione Animale in Ambiente Mediterraneo - ISPAAM
microbial
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/336923
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact