TY - JOUR
T1 - Zero inflated high dimensional compositional data with DeepInsight
AU - Lee, Jeseok
AU - Kim, Byungwon
N1 - Publisher Copyright:
© 2025 Lee, Kim. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2025/4
Y1 - 2025/4
N2 - Through the Human Microbiome Project, research on human-associated microbiomes has been conducted in various fields. New sequencing techniques such as Next Generation Sequencing (NGS) and High-Throughput Sequencing (HTS) have enabled the inclusion of a wide range of features of the microbiome. These advancements have also contributed to the development of numerical proxies like Operational Taxonomic Units (OTUs) and Amplicon Sequence Variants (ASVs). Studies involving such microbiome data often encounter zero-inflated and high-dimensional problems. Based on the need to address these two issues and the recent emphasis on compositional interpretation of microbiome data, we conducted our research. To solve the zero-inflated problem in compositional microbiome data, we transformed the data onto the surface of the hypersphere using a square root transformation. Then, to solve the high-dimensional problem, we modified DeepInsight, an image-generating method using Convolutional Neural Networks (CNNs), to fit the hypersphere space. Furthermore, to resolve the common issue of distinguishing between true zero values and fake zero values in zero-inflated images, we added a small value to the true zero values. We validated our approach using pediatric inflammatory bowel disease (IBD) fecal sample data and achieved an area under the curve (AUC) value of 0.847, which is higher than the previous study’s result of 0.83.
AB - Through the Human Microbiome Project, research on human-associated microbiomes has been conducted in various fields. New sequencing techniques such as Next Generation Sequencing (NGS) and High-Throughput Sequencing (HTS) have enabled the inclusion of a wide range of features of the microbiome. These advancements have also contributed to the development of numerical proxies like Operational Taxonomic Units (OTUs) and Amplicon Sequence Variants (ASVs). Studies involving such microbiome data often encounter zero-inflated and high-dimensional problems. Based on the need to address these two issues and the recent emphasis on compositional interpretation of microbiome data, we conducted our research. To solve the zero-inflated problem in compositional microbiome data, we transformed the data onto the surface of the hypersphere using a square root transformation. Then, to solve the high-dimensional problem, we modified DeepInsight, an image-generating method using Convolutional Neural Networks (CNNs), to fit the hypersphere space. Furthermore, to resolve the common issue of distinguishing between true zero values and fake zero values in zero-inflated images, we added a small value to the true zero values. We validated our approach using pediatric inflammatory bowel disease (IBD) fecal sample data and achieved an area under the curve (AUC) value of 0.847, which is higher than the previous study’s result of 0.83.
UR - https://www.scopus.com/pages/publications/105003082704
U2 - 10.1371/journal.pone.0320832
DO - 10.1371/journal.pone.0320832
M3 - Article
C2 - 40238826
AN - SCOPUS:105003082704
SN - 1932-6203
VL - 20
JO - PLoS ONE
JF - PLoS ONE
IS - 4 April
M1 - e0320832
ER -