TY - JOUR

T1 - Compositional data analysis by the square-root transformation

T2 - Application to NBA USG% data

AU - Lee, Jeseok

AU - Kim, Byungwon

N1 - Publisher Copyright:
© (2024) The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.

PY - 2024

Y1 - 2024

N2 - Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be e_ective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.

AB - Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be e_ective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.

KW - clustering

KW - compositional data analysis

KW - log-ratio transformation

KW - sports data analysis

KW - square-root transformation

UR - http://www.scopus.com/inward/record.url?scp=85195660785&partnerID=8YFLogxK

U2 - 10.29220/CSAM.2024.31.3.349

DO - 10.29220/CSAM.2024.31.3.349

M3 - Article

AN - SCOPUS:85195660785

SN - 2287-7843

VL - 31

SP - 349

EP - 363

JO - Communications for Statistical Applications and Methods

JF - Communications for Statistical Applications and Methods

IS - 3

ER -