COVID-19 emergency has pushed the international scientific community to use every resource to combat the spread of the virus, to understand its biology and predict its possible evolution in terms of new variants. Since the first SARS-CoV-2 virus nucleotide and amino acid sequences were made available, information theory was used to study how viral information content was changing over time and then trace the evolution of its mutational landscape. In this work we analyzed SARS-CoV-2 sequences collected mainly in the USA in a period from March 2020 until December 2022 and computed mutation profiles of viral proteins over time through an entropy-based approach using Shannon Entropy and Hellinger distance. This representation allows an at-a-glance view of the mutational landscape of viral proteins over time and can provide new insights on the evolution of the virus from different points of view. Non-structural proteins typically showed flat mutation profiles, characterized by a very low Average mutation Entropy, while accessory and structural proteins showed mostly non uniform and high mutation profiles, often coupled with the predominance of variants. Interestingly NSP2 protein, whose function is currently still debated, falls in the same branch of NSP14 and NSP10 in the phylogenetic tree of mutations constructed through correlations of mutation profiles, suggesting a co-evolution of those proteins and a possible functional link with each other. To the best of our knowledge this is the first study based on a massive amount of data (n = 107,939,973) that analyzes from an entropy point of view the mutational landscape of SARS-CoV-2 over time and depicts a mutational temporal profile of each protein of the virus.

An entropy-based study on the mutational landscape of SARS-CoV-2 in USA: Comparing different variants and revealing co-mutational behavior of proteins

Santoni D.
Primo
2024

Abstract

COVID-19 emergency has pushed the international scientific community to use every resource to combat the spread of the virus, to understand its biology and predict its possible evolution in terms of new variants. Since the first SARS-CoV-2 virus nucleotide and amino acid sequences were made available, information theory was used to study how viral information content was changing over time and then trace the evolution of its mutational landscape. In this work we analyzed SARS-CoV-2 sequences collected mainly in the USA in a period from March 2020 until December 2022 and computed mutation profiles of viral proteins over time through an entropy-based approach using Shannon Entropy and Hellinger distance. This representation allows an at-a-glance view of the mutational landscape of viral proteins over time and can provide new insights on the evolution of the virus from different points of view. Non-structural proteins typically showed flat mutation profiles, characterized by a very low Average mutation Entropy, while accessory and structural proteins showed mostly non uniform and high mutation profiles, often coupled with the predominance of variants. Interestingly NSP2 protein, whose function is currently still debated, falls in the same branch of NSP14 and NSP10 in the phylogenetic tree of mutations constructed through correlations of mutation profiles, suggesting a co-evolution of those proteins and a possible functional link with each other. To the best of our knowledge this is the first study based on a massive amount of data (n = 107,939,973) that analyzes from an entropy point of view the mutational landscape of SARS-CoV-2 over time and depicts a mutational temporal profile of each protein of the virus.
2024
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
Co-mutations
Hellinger distance
Phylogenetic trees
SARS-CoV-2
Shannon Entropy
Variants
File in questo prodotto:
File Dimensione Formato  
santoni_2024_gene_An_entropy-based_study_on_the_mutational_landscape_of_SARS-CoV-2.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 661.26 kB
Formato Adobe PDF
661.26 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/509887
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact