Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue affects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative K- mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classification by a deep learning network. Results computed on three public datasets show the effectiveness of the adopted feature selection method.

Variable ranking feature selection for the identification of nucleosome related sequences

Rizzo R;Fiannaca A;La Rosa M;Urso A
2018

Abstract

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue affects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative K- mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classification by a deep learning network. Results computed on three public datasets show the effectiveness of the adopted feature selection method.
2018
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Inglese
András Benczúr; Bernhard Thalheim; Tomá? Horváth; Silvia Chiusano; Tania Cerquitelli; Csaba Sidló; Peter Z. Revesz
New Trends in Databases and Information Systems
European Conference on Advances in Databases and Information Systems
909
314
324
978-3-030-00062-2
http://www.scopus.com/record/display.url?eid=2-s2.0-85053537151&origin=inward
Springer Heidelberg
Heidelberg
GERMANIA
Sì, ma tipo non specificato
2-5/09/2018
Budapest, Ungheria
Deep learning models
Feature selection
DNA sequences
Epigenomic
Nucleosomes
5
none
Lo Bosco, G; Rizzo, R; Fiannaca, A; La Rosa, M; Urso, A
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/348753
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact