The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may not accord with the structural properties of the XML documents. Therein, an unexplored question is how to choose the type of structural regularity that best adapts to the structures of the available XML documents. We tackle this problem through X-Class, an approach that handles all types of tree-like substructures and allows for choosing the most discriminatory one. Algorithms are designed to learn compact rule-based classifiers in which the chosen substructures discriminate the classes of XML documents.
X-Class: Associative Classification of XML Documents by Structure
Costa Gianni;Ortale Riccardo;Ritacco Ettore
2013
Abstract
The supervised classification of XML documents by structure involves learning predictive models in which certain structural regularities discriminate the individual document classes. Hitherto, research has focused on the adoption of prespecified substructures. This is detrimental for classification effectiveness, since the a priori chosen substructures may not accord with the structural properties of the XML documents. Therein, an unexplored question is how to choose the type of structural regularity that best adapts to the structures of the available XML documents. We tackle this problem through X-Class, an approach that handles all types of tree-like substructures and allows for choosing the most discriminatory one. Algorithms are designed to learn compact rule-based classifiers in which the chosen substructures discriminate the classes of XML documents.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.