TY - JOUR
T1 - Compositional data analysis by the square-root transformation
T2 - Application to NBA USG% data
AU - Lee, Jeseok
AU - Kim, Byungwon
N1 - Publisher Copyright:
© (2024) The Korean Statistical Society, and Korean International Statistical Society. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be e_ective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.
AB - Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be e_ective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.
KW - clustering
KW - compositional data analysis
KW - log-ratio transformation
KW - sports data analysis
KW - square-root transformation
UR - http://www.scopus.com/inward/record.url?scp=85195660785&partnerID=8YFLogxK
U2 - 10.29220/CSAM.2024.31.3.349
DO - 10.29220/CSAM.2024.31.3.349
M3 - Article
AN - SCOPUS:85195660785
SN - 2287-7843
VL - 31
SP - 349
EP - 363
JO - Communications for Statistical Applications and Methods
JF - Communications for Statistical Applications and Methods
IS - 3
ER -