Two Step Cluster

 In two-stepclustering algorithms, the first that comes to mind is the classical two-stage clustering algorithm proposed by Punj and Steward (1983). This algorithm is a hybrid approach consisting of Ward’s minimum variance method and “Kmeans” method. The advantage of such a mixed approach is that Ward’s minimum variance method calculates the number of clusters required by the “K-means” method [1].


The two-step clustering algorithm is designed to analyze large databases as primary purpose. This algorithm groups the observations in the clusters using the trait approach. Compared to classical clustering algorithms, two-step cluster analysis provides both more continuous and more featured categories. In addition, this method automatically determines the optimum number of clusters.

The two-step clustering algorithm performs the stages of pre-clustering, parsing typical data types, and clustering. During pre-clustering, each data is examined and it is decided whether each data can be included in the previous cluster or another cluster should be started. This decision is taken according to the distance between the data. There are two measures of distance, Euclidean distance and log-probability distance. In the typical data analysis phase, data that cannot be included in any cluster is evaluated. If the inclusion is still not achieved after all attempts have been made for the addition process, this data is separated as external data. In the cluster stage, a tree structure is created. All data starts to be distributed from root to leaves. Each data is attached to a branch close to it, if the number of groups to be included has reached the optimum number of group members, it is attached to another cluster in another branch that is most suitable according to the distance criterion.

To automatically determine the most suitable number of clusters, BIC (Schwarz’s Bayesian Information Criterion) or AIC (Akaike’s Information Criterion) methods are used.

REFERENCE

Hiç yorum yok:

Yorum Gönder