Identify predictive SNP groups in genome wide association study: A sparse learning approach

Zhuo Zhang, Yanwu Xu, Jiang Liu, Chee Keong Kwoh

Research output: Journal PublicationConference articlepeer-review

4 Citations (Scopus)


Genome-Wide Association Study (GWAS) aims to identify genetic variants that are significantly associated with genetic traits. To analyze GWAS data that often contains 0.5 to 1 million Single Nucleotide Polymorphisms (SNPs) genotyped from thousands of individuals, stringent statistical significant thresholds are pre-defined for multiple testing adjustment, e.g., with p-value < 10-8 for single SNP detection and at least < 10-12 for SNP-SNP interaction detection. Such stringent thresholds were used for efficiency computation but it hinders the discovery of many true genetic variants and more practical approaches are needed to conduct GWAS. In this paper, we propose a machine learning approach to identify groups of predictive SNPs in GWAS analysis. Our method differs from other methods by first translates the genomics knowledge into SNP grouping as priors, then select a list of most predictive SNP groups using linear regression regularized by group sparse constraints, solved by Group-lasso (Least Absolute Shrinkage and Selection Operator). The selected SNPs groups compose a sparse feature space which yields a higher predictive power for continuous trait prediction. We conduct experiment on SiMES (Singapore Malay Eye Study) data set, with 3280 Malay individuals genotyped on Illumina 610 quad arrays. We investigate one discrete trait (Glaucoma) and two glaucoma-related quantitative traits, optic Disc-Cup-Ratio (CDR) and Intraocular Pressure (IOP). The hypothesis is that, with more biological knowledge embedded, a learning mechanism yields higher predictive power. Our preliminary results support the above hypothesis. Further analysis reveals that our approach can identify groups of SNPs highly associated with a particular genetic trait, in spite of the small sample size and the incomplete biological knowledge.

Original languageEnglish
Pages (from-to)107-114
Number of pages8
JournalProcedia Computer Science
Publication statusPublished - 2012
Externally publishedYes
Event3rd International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2012 - Bangkok, Thailand
Duration: 3 Oct 20125 Oct 2012


  • Genome wide association study (GWAS)
  • Group-lasso
  • Least absolute shrinkage selector operation (lasso)
  • Regularized linear regression
  • Single nucleotide polymorphism (SNP)

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Identify predictive SNP groups in genome wide association study: A sparse learning approach'. Together they form a unique fingerprint.

Cite this