TY - JOUR
T1 - Identification of significant gene-sets differentially expressed in a specific disease by co-expressed functional gene modules generation
AU - Kim, Jaeyoung
AU - Shin, Miyoung
PY - 2010
Y1 - 2010
N2 - In recent microarray studies, the gene-set analysis is one of the most popular computational approaches to find significant gene-sets that show significantly differential expression between case and control groups of samples. For this purpose, it employs a variety of biological resources such as pathway databases, gene ontology, literatures, and etc., to generate candidate functional gene-sets at the first step. Out of these candidates, then, the most significant ones are identified by taking such gene-sets that have sufficiently high statistical significance in expression difference between case and control groups. Here the significance of each gene-set is usually evaluated based on its representative score obtained from the expression profiles of its constituent genes. In practice, however, the representative score for a gene-set may not be easily able to capture overall characteristics of the expression patterns of its constituent genes. For example, it can occur that some genes in a specific functional gene-set show very different expression pattern from a majority of genes in the same gene-set. In such a case, those genes cause the problem that the representative score for a gene-set gets weakened, eventually leading to the hindrance in estimating the statistical significance of the gene-set. To handle this problem, thus, we propose an approach to employ gene modules, a group of genes which do not only share a specific function in common but are also strongly correlated to each other, as the candidate functional gene-sets for the gene-set analysis. Specifically, from each gene-set of the same functionality, we attempt to filter out the "bad" genes, of which expression patterns in a functional gene-set are not strongly correlated to those of a majority of genes in the same gene-set, by generating co-expressed functional gene modules from each gene-set. Also, for the significance evaluation of these gene modules, a nonparametric Wilcoxon ranksum test is employed. From our experiments, it is observed that our proposed approach to co-expressed functional modules generation for gene-set analysis can greatly improve the performance on the identification of significant gene-sets differentially expressed in a specific disease.
AB - In recent microarray studies, the gene-set analysis is one of the most popular computational approaches to find significant gene-sets that show significantly differential expression between case and control groups of samples. For this purpose, it employs a variety of biological resources such as pathway databases, gene ontology, literatures, and etc., to generate candidate functional gene-sets at the first step. Out of these candidates, then, the most significant ones are identified by taking such gene-sets that have sufficiently high statistical significance in expression difference between case and control groups. Here the significance of each gene-set is usually evaluated based on its representative score obtained from the expression profiles of its constituent genes. In practice, however, the representative score for a gene-set may not be easily able to capture overall characteristics of the expression patterns of its constituent genes. For example, it can occur that some genes in a specific functional gene-set show very different expression pattern from a majority of genes in the same gene-set. In such a case, those genes cause the problem that the representative score for a gene-set gets weakened, eventually leading to the hindrance in estimating the statistical significance of the gene-set. To handle this problem, thus, we propose an approach to employ gene modules, a group of genes which do not only share a specific function in common but are also strongly correlated to each other, as the candidate functional gene-sets for the gene-set analysis. Specifically, from each gene-set of the same functionality, we attempt to filter out the "bad" genes, of which expression patterns in a functional gene-set are not strongly correlated to those of a majority of genes in the same gene-set, by generating co-expressed functional gene modules from each gene-set. Also, for the significance evaluation of these gene modules, a nonparametric Wilcoxon ranksum test is employed. From our experiments, it is observed that our proposed approach to co-expressed functional modules generation for gene-set analysis can greatly improve the performance on the identification of significant gene-sets differentially expressed in a specific disease.
KW - Gene modules
KW - Gene-set analysis
KW - Microarray
KW - Significant gene-sets
KW - Wilcoxon rank-sum test
UR - http://www.scopus.com/inward/record.url?scp=79959685553&partnerID=8YFLogxK
U2 - 10.1007/s13206-010-4307-5
DO - 10.1007/s13206-010-4307-5
M3 - Article
AN - SCOPUS:79959685553
SN - 1976-0280
VL - 4
SP - 204
EP - 209
JO - Biochip Journal
JF - Biochip Journal
IS - 3
ER -