TY - JOUR
T1 - A pruning strategy of reference panels for fast SNP genotype imputation
AU - Jadamba, Erkhembayar
AU - Shin, Miyoung
AU - Chung, Myungguen
AU - Park, Kiejung
PY - 2013/3
Y1 - 2013/3
N2 - In recent genome-wide association studies, the task of genotype imputation for missing SNPs is a common procedure to increase the power of observed genetic markers. For genotype imputation, they usually employ publicly available resources, such as the International HapMap Project data or the 1000 Genome Project data, as a reference panel. However, lately, the volume of publicly available resources is rapidly increasing with the maturation of high-throughput genotyping technology. Thus, it often requires heavy computation for learning large reference panels, leading to long imputation time. In this work, to handle such problem, we propose a pruning strategy for the construction of imputation reference panels which is to reduce the size of reference panel data by excluding (or pruning) somewhat redundant samples from the reference panel based on the estimation of the kinship coefficients between samples. For evaluation, this approach was implemented under the Beagle framework and was tested on two real datasets, Mao et al.'s prostate cancer data and KNIH's diabetes data. Our experiment results show that the proposed pruning strategy for reference panel construction can provide fast imputation time without the loss of imputation accuracy.
AB - In recent genome-wide association studies, the task of genotype imputation for missing SNPs is a common procedure to increase the power of observed genetic markers. For genotype imputation, they usually employ publicly available resources, such as the International HapMap Project data or the 1000 Genome Project data, as a reference panel. However, lately, the volume of publicly available resources is rapidly increasing with the maturation of high-throughput genotyping technology. Thus, it often requires heavy computation for learning large reference panels, leading to long imputation time. In this work, to handle such problem, we propose a pruning strategy for the construction of imputation reference panels which is to reduce the size of reference panel data by excluding (or pruning) somewhat redundant samples from the reference panel based on the estimation of the kinship coefficients between samples. For evaluation, this approach was implemented under the Beagle framework and was tested on two real datasets, Mao et al.'s prostate cancer data and KNIH's diabetes data. Our experiment results show that the proposed pruning strategy for reference panel construction can provide fast imputation time without the loss of imputation accuracy.
KW - Kinship coefficient
KW - Reference panel pruning
KW - SNP imputation
UR - http://www.scopus.com/inward/record.url?scp=84875285617&partnerID=8YFLogxK
U2 - 10.1007/s13206-013-7102-2
DO - 10.1007/s13206-013-7102-2
M3 - Article
AN - SCOPUS:84875285617
SN - 1976-0280
VL - 7
SP - 6
EP - 10
JO - Biochip Journal
JF - Biochip Journal
IS - 1
ER -