SNP discovery

From Applied Bioinformatics Group
Revision as of 00:59, 24 September 2010 by Appbio (talk | contribs)
Jump to: navigation, search

Launch autoSNPdb

autoSNPdb2 under development

Molecular genetic markers describe genetic variations and provide a link between observed phenotypes and the underlying genotype. Single Nucleotide Polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence, are generally abundant in populations and have a low mutation rate. However, SNP markers can be costly to develop, especially where resequencing from multiple individuals is required. The mining of readily available sequence data significantly reduces the costs associated with SNP discovery). Several methods have been developed for SNP discovery from sequence data.

Where sequence trace files are available for comparison to filter out polymorphisms in traces of dubious quality, software such as PolyBayes and Polyphred are the most efficient means to differentiate between true SNPs and sequence error. In cases where trace files are unavailable, the identification of sequence errors can be based on two further methods to determine SNP confidence; redundancy of the polymorphism in an alignment, and co-segregation of SNPs with haplotype.

The frequency of occurrence of a polymorphism at a particular locus provides a measure of confidence in the SNP representing a true polymorphism and is referred to as the SNP redundancy score. In addition, true SNPs that represent divergence between homologous genes co-segregate to define a conserved haplotype. A co-segregation score based on whether a SNP position contributes to defining a haplotype is a further independent measure of SNP confidence. The SNP score and co-segregation score to-gether provide a valuable means for estimating confidence in the validity of SNPs within aligned sequences independent of sequence trace files. Two methods currently apply a combination of redundancy and haplotype co-segregation; autoSNP (Barker et al, 2003,Batley et al, 2003), and SNPServer (Savage et al, 2005).

We have implemented the SNP discovery software autoSNP within a re-lational database to enable the efficient mining of the identified polymorph-isms and the detailed interrogation of the data. AutoSNP was selected because it does not require sequence trace files and is thus applicable to a broader range of species and datasets. The results from autoSNP have previously been integrated with additional data such as gene annotation (Love et al. 2004) and the wheat SNP database cere-alsdb. However, this is the first development of an integrated system for SNP discovery, analysis and interrogation.

The implementation of autoSNPdb allows researchers to query the re-sults of SNP analysis to characterise SNPs between specific groups of individuals or within genes with predicted function. The system is flexible and researchers may add additional levels of annotation, and perform novel queries specific to their area of interest.


References:

  • Batley J, Barker G, O'Sullivan H, Edwards KJ and Edwards D. (2003) Mining for Single Nucleotide Polymorphisms and Insertions/Deletions in Maize Expressed Sequence Tag Data. Plant Physiology 132: 84-91
  • Barker G, Batley J, O'Sullivan H, Edwards KJ and Edwards D. (2003) Redundancy Based Detection of Sequence Polymorphisms in Expressed Sequence Tag Data using AutoSNP. Bioinformatics 19: 421-422
  • Savage D, Batley J, Erwin T, Logan E, Love CG, Lim GAC, Mongin E, Barker GLA, Spangenberg GC and Edwards D. (2005) SNPServer: A Realtime SNP Discovery tool. Nucleic Acids Research 33: D656-D659
  • Duran C, Appleby N, Vardy M, Imelfort M, Edwards D and Batley J. (2009) Single Nucleotide Polymorphism Discovery in Barley using AutoSNPdb. Plant Biotechnology Journal 7 (4): 326 – 333
  • Imelfort M, Duran C, Batley J and Edwards D. (2009) Discovering genetic polymorphisms in next generation sequencing data. Plant Biotechnology Journal 7 (4): 312 – 317
  • Duran C, Appleby N,, Edwards D and Batley J. (2009) Molecular genetic markers: discovery, applications, data storage and visualisation. Current Bioinformatics 4:16-27
  • Duran C, Appleby N, Clark T, Wood D, Imelfort M, Batley J and Edwards D. (2009) AutoSNPdb: An Annotated Single Nucleotide Polymorphism Database for Crop Plants. Nucleic Acids Research 37: 951–953


Back to research projects