TY - GEN
T1 - A gene ranking method using text-mining for the identification of disease related genes
AU - Lee, Hyungmin
AU - Shin, Miyoung
AU - Hong, Munpyo
PY - 2010
Y1 - 2010
N2 - For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the gene-gene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.
AB - For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the gene-gene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.
KW - Disease related genes
KW - Gene ranking
KW - Microarray data analysis
KW - Relation extraction
KW - Text-mining
UR - http://www.scopus.com/inward/record.url?scp=79952406066&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2010.5706616
DO - 10.1109/BIBM.2010.5706616
M3 - Conference contribution
AN - SCOPUS:79952406066
SN - 9781424483075
T3 - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
SP - 493
EP - 498
BT - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
T2 - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
Y2 - 18 December 2010 through 21 December 2010
ER -