Fishing for variants in the deep end of the gene pool: OGT's custom bait designs

Monday 21 May 2012

Jolyon Holdstock, Simon Hughes and Daniel Swan

Want to download this as a PDF? Download now or preview the application note below.

Abstract

Oxford Gene Technology (OGT) has extensive expertise in probe design for solid and liquid phase hybridisation and is now applying this experience to the design of custom baits for targeted sequencing. Targeted custom sequencing offers significant benefits over whole genome or whole exome sequencing including:

  • Enabling the focused sequencing of particular regions of interest 
  • Greater depth of sequencing coverage 
  • Reduced cost 
  • Simpler data analysis 
  • Shorter time to results 
  • Potential to study larger number of patients

However custom bait design is not straightforward and incorrect bait design can render results unusable. This is where OGT’s custom bait designs can add significant value, by ensuring:

  • Increased depth of coverage 
  • Decreased off-target noise 
  • Increased sensitivity for variant detection 
  • Improved capture of GC rich targets and even coverage across the entire region of interest

Introduction

The on-going development of next generation sequencing (NGS) technologies have provided the researcher with the ability to screen for tens of thousands of sequence variants of possible clinical relevance in a single patient simultaneously. NGS offers the possibility to identify aneuploidy, unbalanced chromosomal rearrangements, sub-chromosomal deletions or duplications, loss of heterozygosity, SNPs, indels as well as the more difficult to detect copy-neutral variants (e.g. balanced chromosomal inversions or translocations). In contrast with other technologies, NGS offers the capability to scan for disease causing variants without a priori information about the causative genes and gives hugely increased levels of sensitivity at single bass resolution.

Exome and custom targeted approaches to sequencing have already had a major impact on the diagnosis of disease by permitting the successful identification of causal mutations for a number of monogenic disorders1-3 as well as for some cancers4,5. There has also been success in using exome screening for complex disorders6 and targeted sequencing in assessing personal disease risk7.

This application note examines how OGT’s custom bait design can facilitate sequencing projects by:

  • Targeting specific regions of interest for variant detection (rather than whole genome or exome) 
  • Improving call accuracy by increasing depth of coverage (>1000 fold versus ≤30) 
  • Enabling high-throughput processing of samples at reduced cost (i.e. hundreds of custom samples rather than tens of exomes or even fewer whole genomes per sequencing lane) 
  • Increased sensitivity for variant detection • Streamlining data processing by only analysing your regions of interest rather than an exome or genomes worth of clinically irrelevant sequence data

What to sequence

Project design, part of OGT’s Targeted Sequencing service, is essential to a successful sequencing project and starts with the selection of the most appropriate genomic content for your study.

  • The human genome is 3 billion base pairs and when sequenced at 30x coverage allows for 6 genomes to be sequenced in a single run*. 
  • The human exome, which is 1.5% of the human genome and corresponds to gene encoding regions, when sequenced at 30x coverage allows for ~200 exomes to be sequenced in a single run (depending on multiplexing capacity)*. 
  • Custom targets ranging from 0.2-34Mb when sequenced at 30x coverage allow for several hundred to tens of thousands of samples to be sequenced in a single run (depending on multiplexing capacity)*.

* At OGT sequencing is performed using the Illumina HiSeq 2000, running the latest chemistry.

Custom Sequencing

While both whole genome and whole exome sequencing generate large amounts of data, sequencing more of the genome is not always better; indeed many of the findings from whole genome studies would have been discovered more quickly, more cost effectively and with lower data complexity using an exome-based approach. Similarly custom targeted sequencing provides a logical focused approach rather than the nonselective sequencing of informative regions. As a consequence, analysis of custom regions offers significant benefits for some studies including:

  • Enabling the sequencing of non-coding regions, or focus on particular candidate regions (exonic and intronic), identified by genome-wide association studies (GWAS)
  • Much greater depth of sequencing coverage, increasing the chance of mutation detection when studying heterogeneous tumour samples or circulating cell free tumour or foetal DNA
  • When combined with high-throughput processing, custom sequencing offers an attractive area for NGS diagnostic development and the advantage of quicker turn-around of samples at reduced cost with simpler data analysis and the potential to study larger numbers of patients

The importance of accuracy

Targeted approaches using off-the-shelf exome capture kits can lead to significant imbalances in sequence coverage. Achieving 99.999% accuracy in calling heterozygous bases (assuming no allelic bias) requires a minimum depth of 25 reads at the site of interest. However, a typical off-the-shelf exome capture run may only have 70% of bases covered at ≥25x depth.

Intelligent bait design with OGT

Custom bait design requires extensive optimisation of the capture probes to ensure the entire region of interest receives even coverage. OGT’s expertise in probe design for solid and liquid phase hybridisation and >10 years of experience in microarray design and analysis ensures that we can add significant value in this area.

When attempting to increase sequencing coverage there are two options. The first is to perform more runs on your platform of choice and increase coverage by generating more reads. However, this increases the coverage of all targets proportionally, inflating costs and still not guaranteeing good data for hard to capture (and thus hard to sequence) regions.

The second option, offered by OGT, is to carry out refined, intelligent design of capture baits. This can increase the coverage of hard to sequence loci without increasing the amount of sequencing that needs to be performed. This is a cost-effective way to generate more even coverage and increase the power to detect variants.

Bait design considerations

Designing baits for sequence capture is not a straightforward process. Bait design software is freely available but not generally user-friendly. The draft designs generated by such software often need additional refinement before the baits are ready to be used in an experimental setting. It is easy to create potential sources of capture bias by creating region of interests (ROI) that are too short, or affected by thermodynamic behaviours such as GC content or melting temperatures (Tm). OGT has extensive experience in designing oligonucleotide probes and this allows us to provide bait capture designs that minimise these issues, giving the best possible opportunity for variant detection.

All that is required to start the bait design process is a list of genes or chromosomal regions and the genome build version on which these are based. An initial draft is produced and then assessed for coverage of the ROIs, bait distribution and sequence complexity. Iterative rounds of improvement are then applied to the design, correcting for singleton baits (regions spanning less than 120 bases and thus covered by a single bait) by addition of baits to the design to ensure even coverage in these regions. GC content is calculated for all baits and where extreme biases of GC content are identified (baits with GC <40% or >65%) additional copies of these baits are added. This corrects a common issue in targeted capture where regions of extreme GC content lead to reduced coverage8. Similarly, Tm is also calculated for each bait, and where Tm is extreme (e.g. >75°C) additional copies of these baits are also added.

Custom baits in action

OGT has designed custom baits and compared their performance to publicly available exome data from the 1000 Genomes Project9 and whole exome data from OGT on the HapMap sample NA12878.

Increased depth of coverage

Figure 1 shows coverage for a representative exon captured by OGT custom baits, a 3.5 fold increase in coverage (1024x vs. 282x at the centre of the capture target). The OGT whole exome capture has similar depth to the 1000 Genomes data at this position (251x). OGT custom baits will generally provide 3–5.5 fold more coverage than a standard exome capture.

Decreased off-target noise

Figure 2 shows how the OGT bait design decreases off-target noise. Reducing off-target hits increases the certainty that variations observed are true positives and biologically relevant, removing SNPs that have been called in intronic or extragenic regions.

Increased sensitivity for variant detection

As depth increases, the peak tail off reduces, which allows nucleotides towards the outer edge of the capture regions to be assayed more accurately for variations. Figure 3 shows a deletion that would not be detected from whole exome capture, but is clearly seen in the OGT custom bait capture.

Depth of coverage is increased with OGT custom baits vs. 1000 Genomes whole exome captureFigure 1: Depth of coverage is increased with OGT custom baits (above) vs. 1000 Genomes whole exome capture (below).The yellow boxes show the total read count covering the position, and the distribution of nucleotides at that position, along with their strand distribution.

Custom baits reduce hybridisation artefacts in off-target regionsFigure 2: Custom baits reduce hybridisation artefacts in off-target regions, OGT custom baits design (above) vs. 1000 Genomes whole exome (below).

OGT custom bait capture (above) and 1000 Genomes exome capture (below)Figure 3: OGT custom bait capture (above), 1000 Genomes whole exome capture (below) shows that the increased read depth of custom baits allows detection of a deletion at the edge of the target region.

Increased depth also increases the number of accurate calls. The example in Figure 4 shows a SNP that is unambiguously detected with OGT custom capture baits, but is not detected in the whole exome capture in the same analysis pipeline, despite the whole exome capture having a good read depth and allelic spread of the heterozygous SNP that is present at this location.

Increased coverage with OGT custom capture baits allows accurate detection of SNPsFigure 4: Even at >50x coverage, whole exome capture (below) does not accurately identify all SNPs, whereas the increased coverage with OGT custom capture baits allows the variant to be detected.

Improved capture of GC rich targets

GC content bias is a known issue in whole exome capture, where areas of high GC content (>65%) are under-represented due to thermodynamic constraints of the hybridisation.

OGT’s custom bait designs are refined to compensate for this bias, which often affects the capture of first exons that are often GC rich relative to the rest of the transcribed sequence. Figures 5 and 6 show this clearly. Figure 5 shows the first exon of HDAC10, which is targeted for capture by an OGT custom bait design and the Agilent SureSelect 50Mb kit. The data shows a patient sample vs a test HapMap sample at the same site with a GC content of 70% in the target interval.

OGT custom bait capture of a region with 70% GC contentFigure 5: OGT custom bait capture of a region with 70% GC content showing a maximum read depth of 50x (above). The Agilent SureSelect 50Mb kit does not capture any reads in this region (below).

Figure 6 shows a region of 65% GC content sequenced with capture by both OGT custom baits (above) and the Agilent SureSelect 50Mb kit (below). Whilst the Agilent SureSelect kit captures 20x coverage on the leftmost target region, it has very low coverage of the two capture regions to the right of the figure. In contrast, OGT’s custom baits in this GC biased region achieve a coverage of 425x.

Relative capture of targets within a single geneFigure 6: Relative capture of targets within a single gene. Agilent coverage is 20x for the target with no GC content bias, and minimal for targets with a GC content of 65%. In contrast, OGT custom baits perform excellently in this region.

Summary

Whilst whole exome sequencing offers a powerful route into analysis of mendelian disorders and provides a platform for GWAS studies, custom designs offer significant advantages where the biological question is more focused, such as GWAS follow up or investigations into the mutational analysis of specific pathways or genes in a clinical context.

With the increased focus on a smaller number of targets a number of advantages are realised including:

  • Higher sample throughput with increased multiplexing opportunities 
  • Increased read depth and decreased off-target noise leading to improved sensitivity and specificity for variant detection 
  • Decreased computational complexity of analysis 
  • The ability to capture and sequence regions not covered by whole exome capture kits 
  • Increased cost-efficiency saving money or allowing more samples

References

  1. Choi, M. et al (2009) Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences of the United States of America 106, 19096-19101 
  2. Ng, S.B. et al (2010) Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics 42, 30-35 
  3. Ng, S.B. et al (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-276 
  4. Wei, X. et al (2011) Exome sequencing identifies GRIN2A as frequently mutated in melanoma. Nature Genetics 43, 442-446 
  5. Yan, X.J. et al (2011) Exome Sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nature Genetics 43, 309-315 
  6. Lehne, B. et al (2011) Exome localisation of complex disease association signals. BMC Genomics 12:92 
  7. Klassen, T. et al (2011) Exome Sequencing of Ion Channel Genes Reveals Complex Profiles Confounding Personal Risk Assessment in Epilepsy. Cell, 145, 1036-1048 
  8. Tewhey, R. et al (2009) Enrichment of sequencing targets from the human genome by solution hybridisation. Genome Biology 10(10): R116 
  9. The 1000 Genomes Project Consortium (2010) A map of human genome variation from populationscale sequencing. Nature 467, 1061-1073

Request a technical consultation

Do you have a question about what you've just read? Contact us today and one of our technical specialists will be happy to answer any questions you may have.

Request a technical consultation

 

 

Download literature

Download a PDF of this literature.

Download Literature