TY - GEN
T1 - IDEA
T2 - 2021 IEEE International Conference on Big Data, Big Data 2021
AU - Ahn, Hongryul
AU - Jung, Inuk
AU - Chae, Heejoon
AU - Oh, Minsik
AU - Kim, Inyoung
AU - Kim, Sun
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Hierarchical clustering, a traditional clustering method, has been getting attention again. Among several reasons, a credit goes to a recent paper by Dasgupta in 2016 that proposed a cost function that quantitatively evaluates hierarchical clustering trees. An important question is how to combine this recent advance with existing successful clustering methods. In this paper, we propose a hierarchical clustering method to minimize the cost function of clustering tree by incorporating existing clustering techniques. First, we developed an ensemble tree-search method that finds an integrated tree with reduced cost by integrating multiple existing hierarchical clustering methods. Second, to operate on large and arbitrary shape data, we designed an efficient hierarchical clustering framework, called integrating divisive and ensemble-agglomerate (IDEA) by combining it with advanced clustering techniques such as nearest neighbor graph construction, divisive-agglomerate hybridization, and dynamic cut tree. The IDEA clustering method showed better performance in minimizing Dasgupta's cost and improving accuracy (adjusted rand index) over existing cost-minimization-based, and density-based hierarchical clustering methods in experiments using arbitrary shape datasets and complex biology-domain datasets.
AB - Hierarchical clustering, a traditional clustering method, has been getting attention again. Among several reasons, a credit goes to a recent paper by Dasgupta in 2016 that proposed a cost function that quantitatively evaluates hierarchical clustering trees. An important question is how to combine this recent advance with existing successful clustering methods. In this paper, we propose a hierarchical clustering method to minimize the cost function of clustering tree by incorporating existing clustering techniques. First, we developed an ensemble tree-search method that finds an integrated tree with reduced cost by integrating multiple existing hierarchical clustering methods. Second, to operate on large and arbitrary shape data, we designed an efficient hierarchical clustering framework, called integrating divisive and ensemble-agglomerate (IDEA) by combining it with advanced clustering techniques such as nearest neighbor graph construction, divisive-agglomerate hybridization, and dynamic cut tree. The IDEA clustering method showed better performance in minimizing Dasgupta's cost and improving accuracy (adjusted rand index) over existing cost-minimization-based, and density-based hierarchical clustering methods in experiments using arbitrary shape datasets and complex biology-domain datasets.
KW - Divisive-agglomerate hybrid clustering
KW - Ensemble clustering
KW - Hierarchical clustering
KW - Tree cost minimization
UR - http://www.scopus.com/inward/record.url?scp=85125329865&partnerID=8YFLogxK
U2 - 10.1109/BigData52589.2021.9671953
DO - 10.1109/BigData52589.2021.9671953
M3 - Conference contribution
AN - SCOPUS:85125329865
T3 - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
SP - 2791
EP - 2800
BT - Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
A2 - Chen, Yixin
A2 - Ludwig, Heiko
A2 - Tu, Yicheng
A2 - Fayyad, Usama
A2 - Zhu, Xingquan
A2 - Hu, Xiaohua Tony
A2 - Byna, Suren
A2 - Liu, Xiong
A2 - Zhang, Jianping
A2 - Pan, Shirui
A2 - Papalexakis, Vagelis
A2 - Wang, Jianwu
A2 - Cuzzocrea, Alfredo
A2 - Ordonez, Carlos
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 December 2021 through 18 December 2021
ER -