There are two features of our SNP (Single Nucleotide Polymorphism) discovery strategy, in both the resistance and adaptation parts of our project. First, we sequence cDNA (complementary DNA, or cloned RNA transcripts), ensuring that we skip over introns, find relevant amino-acid changing variants, and more likely survey entire genes. Second, we sequence effectively from diploid DNA (to the extent that RNA expression does not differ between alleles). This doubles the effective sample size (compared to sequencing from megagametophytes, being done by our Davis co-funders).
We have implemented the automated identification of heterozygous positions at a SNP locus from chromatograms. In these high-throughput analyses of SNPs, chromatograms are read by PHRED, trimmed by PHRAP, and visualized by CONSED. We have developed this pipeline in our project in consultation with our UC Davis co-funder. However, the diploid nature of our sequences provokes special problems that can only be resolved by more human analyses. A commercial version of these programs is distributed as a stand alone application. This program,”CodonCode Aligner”, has a user-friendly platform, with additional graphic outputs that allow visual inspection of sequence quality. We use this program to verify “high interest” SNPs, after passing through our high-throughput pipeline.
Candidate Gene Selection, Primer Design, and Primer Testing
For white spruce resistance, we focus on genes identified in the Functional Genomics program of our project. Since these genes exist as members of large gene families, we developed a special algorithm that designs primers unique to a specific gene family member. This also requires a full inventory of gene family members, which is provided by our large EST and FLcDNA collection. We have designed sequencing primers for over 1500 candidate genes. Over half of these have been tested in the wet lab and belong to the following groups: phenylpropanoid pathway genes, terpenoid/isoprenoid pathway genes, AP2 sequences, and chitinase. About 60% of these have yielded single PCR products. For those that do not give single products, rather than redesign primers for specific genes, we move on to different genes. Our next focus will be to target genes that have high dN/dS ratios from nucleotide variation patterns, and are also single-copy (COS-like), as well as those resistance genes with high joint protein-gene-metabolite expression profiles.
SNP Discovery and Resequencing
In white spruce, the strategy to find SNPs is to sequence a panel of 12 diploid individuals. This number is enough individuals to detect moderate frequency SNPs (gene frequency of at least 0.10) with high probability (95%). We define “detection” as the occurrence of at least two heterozygotes, or one heterozygote and one homozygote, or two homozygotes, containing the less frequent SNP allele. The occurrence of just one different genotype is interpreted as a sequence artefact.
As of June 2007, we have sequenced single-banded PCR products: from cytochrome P450, cinnamic acid 4-hydroxylase, phenylalanine ammonia-lyase, caffeate O-methyltransferase, chorismate synthase-like, phenylcoumaran benzylic ether reductase, dirigent proteins, and terpene synthases. We have analyzed 5995 chromatograms for white Spruce obtained using 809 allele-specific primers and 7655 chromatograms for Sitka Spruce (839 primers), accumulated since October 2006 to date. We found 162 SNPs in Sitka (A/C 11.73%, C/G 15.43%, A/T 7.41%, A/G 29.63%, C/T 23.46%, G/T 12.35%) and 461 in White Spruce (A/C 12.36%, C/G 14.53%, A/T 9.54%, A/G 29.28%, C/T 25.38%, G/T 8.89%). The computer generated results are subsequently confirmed by the visualization of the trace files for each polymorphic site using CodonCode Aligner.
In Sitka spruce, a panel of 24 individuals from across the species range of Sitka spruce serves as the discovery panel; these individuals originate from six widely distributed geographical populations. A sample size of 24 takes into the account the larger geographical differentiation among Sitka spruce, whose latitude spans from California to Alaska. These same genes will also be sequenced in the 12-member white spruce panel, discussed above.