Compositional data analysis by the square-root transformation: Application to NBA USG% data

Jeseok Lee, Byungwon Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be e_ective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.

Original languageEnglish
Pages (from-to)349-363
Number of pages15
JournalCommunications for Statistical Applications and Methods
Volume31
Issue number3
DOIs
StatePublished - 2024

Keywords

  • clustering
  • compositional data analysis
  • log-ratio transformation
  • sports data analysis
  • square-root transformation

Fingerprint

Dive into the research topics of 'Compositional data analysis by the square-root transformation: Application to NBA USG% data'. Together they form a unique fingerprint.

Cite this