Yayın:
Unsupervised learning from multi-dimensional data: A fast clustering algorithm utilizing canopies and statistical information

Placeholder

Tarih

Akademik Birimler

Kurum Yazarları

Özcan, Giyasettin

Yazarlar

Danışman

Dil

Türü

Yayıncı:

World Scientific Publ Co Pte Ltd

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Özet

In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider k-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy+. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy+ algorithm benefits from early termination and introduces extra 1.2X performance improvement.

Açıklama

Kaynak:

Anahtar Kelimeler:

Konusu

Computer science, Operations research & management science, Data mining, Multi-dimensional datasets, K-means clusteringcanopies, Normalization, Early termination

Alıntı

Özcan, G. (2018). ''Unsupervised learning from multi-dimensional data: A fast clustering algorithm utilizing canopies and statistical information''. International Journal of Information Technology and Decision Making, 17(3), 841-856.

Endorsement

Review

Supplemented By

Referenced By

3

Views

0

Downloads

View PlumX Details