Exploratory Data Analysis (EDA) is an approach for summarizing and visualizing the important characteristics of a data set, in order to make a prearranged data screening and display multivariate data in a graphical way, to render them more comprehensible. Moreover, it reveals hidden aspects within the simple evaluations. In particular, EDA is suitable for datasets with comparable variables, as structural-geometrical protein features. In this work, we analyzed some proteins belonging to ten different architectural families. After retrieval, feature selection and normalization stages, the dataset has been processed by means of simple correlation, partial correlation and principal component analysis (PCA), highlighting familyindependent or family-specific relationships, and possible outliers for the dataset itself. The results can be useful to connect these features to functional protein properties.

Basic Exploratory Proteins Analysis with Statistical Methods Applied on Structural Features

Eugenio Del Prete;Serena Dotolo;Angelo Facchiano
2015

Abstract

Exploratory Data Analysis (EDA) is an approach for summarizing and visualizing the important characteristics of a data set, in order to make a prearranged data screening and display multivariate data in a graphical way, to render them more comprehensible. Moreover, it reveals hidden aspects within the simple evaluations. In particular, EDA is suitable for datasets with comparable variables, as structural-geometrical protein features. In this work, we analyzed some proteins belonging to ten different architectural families. After retrieval, feature selection and normalization stages, the dataset has been processed by means of simple correlation, partial correlation and principal component analysis (PCA), highlighting familyindependent or family-specific relationships, and possible outliers for the dataset itself. The results can be useful to connect these features to functional protein properties.
2015
Istituto di Scienze dell'Alimentazione - ISA
Inglese
Mathematical Models in Biology
173
187
978-3-319-23496-0
Springer International Publishing
CH-6330 Cham (ZG)
SVIZZERA
Sì, ma tipo non specificato
Correlation o Exploratory data analysis o Global features o Principal component analysis o Protein structure
4
02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio)
268
none
Del Prete, Eugenio; Dotolo, Serena; Marabotti, Anna; Facchiano, Angelo
info:eu-repo/semantics/bookPart
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/307323
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact