Enhancing clustering representations with positive proximity and cluster dispersion learning

Abhishek Kumar, Dong Gyu Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Contemporary deep clustering approaches often rely on contrastive or non-contrastive techniques to acquire effective representations for clustering tasks. Contrastive methods apply negative pairs to achieve homogenous representations but can introduce class collision problems, potentially compromising clustering performance. In contrast, non-contrastive techniques prevent class collisions but may produce non-uniform representations that lead to clustering collapse. In this work, we propose a novel end-to-end deep clustering approach named PIPCDR, designed to harness the strengths of both approaches while mitigating their limitations. PIPCDR incorporates a positive instance proximity loss and a cluster dispersion regularizer. The positive instance proximity loss ensures alignment between augmented views of instances and their sampled neighbors, enhancing within-cluster compactness by selecting genuinely positive pairs within the embedding space. Meanwhile, the cluster dispersion regularizer maximizes inter-cluster distances while minimizing within-cluster compactness, promoting uniformity in the learned representations. PIPCDR excels in producing well-separated clusters, generating uniform representations, avoiding class collision issues, and enhancing within-cluster compactness. We extensively validate the effectiveness of PIPCDR within an end-to-end Majorize-Minimization framework, demonstrating its competitive performance on moderate-scale clustering benchmark datasets and establishing new state-of-the-art results on large-scale datasets.

Original languageEnglish
Article number121277
JournalInformation Sciences
Volume686
DOIs
StatePublished - Jan 2025

Keywords

  • Class collision
  • Contrastive learning
  • Deep clustering
  • Representation learning
  • Self-supervised learning

Fingerprint

Dive into the research topics of 'Enhancing clustering representations with positive proximity and cluster dispersion learning'. Together they form a unique fingerprint.

Cite this