Introduction Importance of Sharing Experimental Data Science is an ever-evolving endeavor, with all new research grounded in knowledge gained in previous studies and publications. This applies not only at the level of theory and fundamental knowledge, but also at the level of specific data. In the context of enzyme research, that includes information on properties such as protein production and folding, protein solubility, stability, catalytic activity, together with specificity and stereoselectivity, as well as regulatory effects as activation and inhibition, and kinetics, which are crucial for multiple practical reasons. In the fields of biology and biochemistry, the availability of high-quality experimental data has already contributed to several breakthroughs over time. One example is AlphaFold 2, (1) released in 2021, a machine learning-based tool that predicts the 3D structures of proteins with unprecedented accuracy. Its release represented a major breakthrough in structural biology, addressing a long-standing challenge that had persisted for decades. A key element in the success of AlphaFold was the large number of experimental protein structures available in the Protein Data Bank (ca. 159,000 in 2019). (2) This was made possible because the deposition of crystallographic, nuclear magnetic resonance (NMR), and electron microscopy (cryo-EM) structures in a uniform format into databases became the gold standard and a strict requirement for their publication three decades before the AlphaFold release. (3,4) Thanks to the high quality and the large volume of its data, the Protein Data Bank also enabled the development of molecular docking and other tools. Other examples are UniProt (5) and BRENDA, (6) databases that contributed to functional prediction tools, (7−9) metabolic modeling, (10−12) and large-scale enzyme design efforts. (13−1516) Their success relies heavily on community contributions, data quality checks, and manual curation.
Mobilizing the Biocatalysis Community for Reproducible and Reusable Data Collection
Ferrandi, Erica Elisa;Monti, Daniela;Patti, Stefania;
2026
Abstract
Introduction Importance of Sharing Experimental Data Science is an ever-evolving endeavor, with all new research grounded in knowledge gained in previous studies and publications. This applies not only at the level of theory and fundamental knowledge, but also at the level of specific data. In the context of enzyme research, that includes information on properties such as protein production and folding, protein solubility, stability, catalytic activity, together with specificity and stereoselectivity, as well as regulatory effects as activation and inhibition, and kinetics, which are crucial for multiple practical reasons. In the fields of biology and biochemistry, the availability of high-quality experimental data has already contributed to several breakthroughs over time. One example is AlphaFold 2, (1) released in 2021, a machine learning-based tool that predicts the 3D structures of proteins with unprecedented accuracy. Its release represented a major breakthrough in structural biology, addressing a long-standing challenge that had persisted for decades. A key element in the success of AlphaFold was the large number of experimental protein structures available in the Protein Data Bank (ca. 159,000 in 2019). (2) This was made possible because the deposition of crystallographic, nuclear magnetic resonance (NMR), and electron microscopy (cryo-EM) structures in a uniform format into databases became the gold standard and a strict requirement for their publication three decades before the AlphaFold release. (3,4) Thanks to the high quality and the large volume of its data, the Protein Data Bank also enabled the development of molecular docking and other tools. Other examples are UniProt (5) and BRENDA, (6) databases that contributed to functional prediction tools, (7−9) metabolic modeling, (10−12) and large-scale enzyme design efforts. (13−1516) Their success relies heavily on community contributions, data quality checks, and manual curation.| File | Dimensione | Formato | |
|---|---|---|---|
|
Marques_ACS2026.pdf
accesso aperto
Descrizione: articolo ACS
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Visualizza/Apri |
|
cs5c07904_si_001.xlsx
accesso aperto
Descrizione: SI_001
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
83.59 kB
Formato
Microsoft Excel XML
|
83.59 kB | Microsoft Excel XML | Visualizza/Apri |
|
cs5c07904_si_002.xlsx
accesso aperto
Descrizione: SI_002
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
57.09 kB
Formato
Microsoft Excel XML
|
57.09 kB | Microsoft Excel XML | Visualizza/Apri |
|
cs5c07904_si_003.xlsx
accesso aperto
Descrizione: SI_003
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
27.5 kB
Formato
Microsoft Excel XML
|
27.5 kB | Microsoft Excel XML | Visualizza/Apri |
|
cs5c07904_si_004.xlsx
accesso aperto
Descrizione: SI_004
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
11.19 kB
Formato
Microsoft Excel XML
|
11.19 kB | Microsoft Excel XML | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


