CLUSTERING ON THE TORUS BY CONFORMAL PREDICTION

Sungkyu Jung, Kiho Park, Byungwon Kim

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Motivated by the analysis of torsion (dihedral) angles in the backbone of proteins, we investigate clustering of bivariate angular data on the torus [-π,π) × [-π,π). We show that naive adaptations of clustering methods, designed for vector-valued data, to the torus are not satisfactory and propose a novel clustering approach based on the conformal prediction framework. We construct several prediction sets for toroidal data with guaranteed finitesample validity, based on a kernel density estimate and bivariate von Mises mixture models. From a prediction set built from a Gaussian approximation of the bivariate von Mises mixture, we propose a data-driven choice for the number of clusters and present algorithms for an automated cluster identification and cluster membership assignment. The proposed prediction sets and clustering approaches are applied to the torsion angles extracted from three strains of coronavirus spike glycoproteins (including SARS-CoV-2, contagious in humans). The analysis reveals a potential difference in the clusters of the SARS-CoV-2 torsion angles, compared to the clusters found in torsion angles from two different strains of coronavirus, contagious in animals.

Original languageEnglish
Pages (from-to)1583-1603
Number of pages21
JournalAnnals of Applied Statistics
Volume15
Issue number4
DOIs
StatePublished - 2021

Keywords

  • Density estimation
  • Directional statistics
  • Prediction set
  • Protein structure
  • Torsion angles
  • Von Mises distribution

Fingerprint

Dive into the research topics of 'CLUSTERING ON THE TORUS BY CONFORMAL PREDICTION'. Together they form a unique fingerprint.

Cite this