Despite being one of the most common approach in unsupervised data analysis, a very small literature exists on the formalization of clustering algorithms. This paper proposes a semiring-based methodology, named Feature-Cluster Algebra, which is applied to abstract the representation of a labeled tree structure representing a hierarchical categorical clustering algorithm, named CCTree. The elements of the feature-cluster algebra are called terms. We prove that a specific kind of a term, under some conditions, fully abstracts a labeled tree structure. The abstraction methodology maps the original problem to a new representation by removing unwanted details, which makes it simpler to handle. Moreover, we present a set of relations and functions on the algebraic structure to shape the requirements of a term to represent a CCTree structure. The proposed formal approach can be generalized to other categorical clustering (classification) algorithms in which features play key roles in specifying the clusters (classes).
On the abstraction of a categorical clustering algorithm
Sheikhalishahi M;
2016
Abstract
Despite being one of the most common approach in unsupervised data analysis, a very small literature exists on the formalization of clustering algorithms. This paper proposes a semiring-based methodology, named Feature-Cluster Algebra, which is applied to abstract the representation of a labeled tree structure representing a hierarchical categorical clustering algorithm, named CCTree. The elements of the feature-cluster algebra are called terms. We prove that a specific kind of a term, under some conditions, fully abstracts a labeled tree structure. The abstraction methodology maps the original problem to a new representation by removing unwanted details, which makes it simpler to handle. Moreover, we present a set of relations and functions on the algebraic structure to shape the requirements of a term to represent a CCTree structure. The proposed formal approach can be generalized to other categorical clustering (classification) algorithms in which features play key roles in specifying the clusters (classes).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.