Exploring the cytogenetic realm
Review of OGT’s ASHG workshop 2009
Copy number variation (CNV) data are essential in the interpretation of the human genome and its ability to support, or its susceptibility to, mutational load. Being able to cross reference such data would provide many scientific benefits, including much earlier identification of syndromes that can’t be definitively determined phenotypically within the first few years of life (e.g. Autism). Generating the necessary data though, requires a coordinated effort from laboratories around the globe in the collection, analysis and accessibility of results from healthy people, as well as those with defined diagnoses. Research in this area is progressing at pace and a number of public access databases are beginning to provide the resources required, including the database of genomic variants (DGV), which focuses on healthy (non-phenotypic) genomes and DECIPHER, which provides detail of genomic variations in clinical subjects. The International Standards for Cytogenomic Arrays (ISCA) Consortium has also developed a standardised data platform and array format, for CNV data collection and distribution focused primarily on developmental disorders.
Array CGH is the primary tool used for CNV research as it is technologically straightforward and the current platforms have high levels of flexibility, sensitivity and specificity. Whole genomes can be probed on a single array, with areas of particular interest analysed at a higher resolution (less separation between probes). Furthermore, printing technologies have enabled more than one array to be printed per slide, providing increased throughput and cost-efficiency. For example, the CytoSure ISCA aCGH arrays from Oxford Gene Technology (OGT) are available in 8 x 60k, 4 x 180k and 4 x 44k formats (Figure 1).
Figure 1: OGT’s ISCA arrays are available in a range of formats to suit all throughput requirements.
To generate the most complete view of the variation within the genome and therefore the capability to determine which CNVs are benign and which are pathogenic, it is essential to analyse as many samples as possible. This is where process automation becomes invaluable, since there is a limit to the throughput possible with manual methods. Laboratory scale automation, as provided, for example by SciGene (available through OGT in Europe), enable researchers to process more arrays in a reduced amount of time, but are still not ideal for large cohort studies. For such large scale studies, OGT offers the Genefficiency service, which is capable of processing over 2000 samples per week.
At the 2009 ASHG meeting in Honolulu, HI, OGT hosted a workshop where some of the latest aCGH findings into the genetic basis of developmental diseases were outlined. Furthermore, an overview of the OGT Genefficiency service was presented with a summary of the largest CNV study conducted to date.
Of one or many?
Many diseases have clearly defined aetiologies and any genetic effects are often ‘single gene’ and significantly reflected in the phenotype and pathophysiology — e.g. Duchenne and Becker muscular dystrophy (DMD and BMD), cystic fibrosis (CF) and sickle cell anaemia (drepanocytosis). Many syndromic disorders, such as Down’s syndrome, where there are combinations of different pathologies due to the multiple genes involved, have clear genetic changes that make diagnoses definitive. However there are a large number of neurological developmental disorders that do not have clear aetiologies and are generally diagnosed by phenotype often psychologically. Similar psychological phenotypes are sometimes classified under umbrella terms or form parts of scales of severity e.g. the autism spectrum disorders (ASD) and schizophrenia, that are not only apparent between individuals, but also within the same individual over time, such that severity or even diagnosis can change as the patient grows older. What causes such varying phenotypes is the subject of many ongoing investigations and cytogenetic techniques such as aCGH are at the forefront of understanding the complex genetic origin.
Mutational burden analysis
At the OGT seminar in Honolulu, Dr Jonathan Sebat from Cold Spring Harbor Laboratories (CSHL) described the current genetic landscape of schizophrenia, presenting compelling evidence of the role of the overall mutational burden of rare structural variants, both inherited and de novo. For example, he highlighted that the occurrence of novel structural variations of >100kb, was significantly less in control subjects than in the schizophrenia study group. Further analysis of the data showed that in the lower the age of onset, the higher the occurrence of novel structural variation. These data would suggest that the higher the mutational burden, the more severe the neurological result.
Dr Sebat went on to look at studies into mutational hotspots and the involvement of well documented tandem segmental duplications at 16p11.2 as well as deletions at 1q21 and 15q13. Each of these events were noted to increase the risk of developing schizophrenia, but were also noted to be due to recurrent spontaneous mutations, for example at 15q11 62 deletions corresponded to 32 independent mutations.
Mutations in a number of specific genes have also been shown to be associated with a high risk of schizophrenia. For example deletions in the NRXN1 gene (2p16.3) have been shown to occur by numerous mechanisms.
Dr Sebat concluded that from these studies it is clear that rare structural copy number variants (CNVs) are associated with schizophrenia based on their collective frequency in cases and controls, which includes both inherited and de novo CNVs. Moreover, results suggest that some rare CNVs confer substantial risk with approximately 10-fold increase compared to controls.
One locus, multiple disease associations?
As well as a clear involvement in many cases of schizophrenia, the 16p11.2 locus has also been flagged in other psychiatric diseases including ASD, global development delay (GDD) and bipolar disorder. Interestingly, the data point to differential phenotypes based on the nature of the mutation, such that deletion results in ASD and GDD, whereas duplications were seen in Schizophrenia and bipolar, as well as ASD/GDD cases. As a result, Dr Sebat posed the question: do autism and schizophrenia lie at opposite ends of a particular neurobiological process?
As a result of these extensive CNV studies, it is clear that the genetic landscape for neurological developmental disorders is highly complex and requires further extensive investigations. For these to offer additional insight, they must provide an extra level of detail — this can be achieved by increasing the sample sizes and/or by increasing sensitivity. The increase in data collection within facilities such as DGV and DECIPHER, as well as the ISCA consortium, combined with large studies such as the recent Wellcome Trust Case Control Consortium study1 is a great start. Higher sensitivity can be obtained through the use of more extensive statistical analysis of results, and better cross-referencing between datasets, but is best achieved by increasing the experimental sensitivity inherent in the aCGH platform used.
Data data everywhere
Each sample that is analysed for CNVs not only has a direct benefit for the investigation being conducted, but can also potentially aid scientific understanding and improve clinical research as a whole. Ensuring that such data is available and can be interrogated is therefore very important and a number of public databases have been established for hosting CNV and related genomic data.
The database of genomic variants (DGV) was developed initially by the Department of Genetics and Genomic Biology at the University of Toronto. Its objective is to “…provide a comprehensive summary of structural variation in the human genome.”
The database only holds structural variations identified in healthy control samples and is therefore a useful catalogue of control data for studies aiming to correlate genomic variation with phenotypic data. It highlights the wide amount of variation that is present in a normal population.
DECIPHER (DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources) was developed by the Wellcome Trust Sanger Institute as a network of academic centres of clinical genetics. The database holds in depth clinical information about chromosomal microdeletions/duplications/insertions, translocations and inversions and displays this information on the human genome map. It enables researchers to link phenotype and genotype information and can be used to confirm whether an aberration is novel.
At the OGT seminar in Honolulu, Dr David Ledbetter, from the Department of Human Genetics at Emory University described that, similar to the DECIPHER project, the International Standard Cytogenomic Array (ISCA) consortium is a network of >70 laboratories worldwide with all members committed to free data sharing in an international public database being developed with NCBI at NIH. More than just enabling data collation and accessibility though, ISCA is also developing standards for genotype and phenotype data and guidelines for interpretation.
The ISCA array format
The consortium discussed aCGH array format and concluded that, although there were some differences between the preferred formats used by each group, on the whole they were very similar and subsequently created a consensus or community format to ensure better consistency between research groups. The design features full genome coverage with a resolution of at least 250 kb, which is increased to 20-50 kb in known disease relevant targets, and can be applied to 8 x 60k, 4 x 180k, 4 x 44k and 2 x105k arrays. The OGT CytoSure ISCA approved arrays are available in the 8 x 60k, 4 x 180k and 4 x 44k formats.
Pathogenic, uncertain or benign
One of the key aims of the consortium is to enable the classification of CNVs as pathogenic or benign, rather than uncertain (as many presently are). Ledbetter briefly described the terms as:
- PATHOGENIC – Imbalance known to be associated with specific disorder (targeted region) or large imbalance
- UNCERTAIN SIGNIFICANCE – No known associated disorder, but an aberration is detected that spans coding regions. Analysis of parental samples are required to confirm whether the aberration is pathogenic.
- BENIGN – Well described common polymorphism in normal individuals (e.g. defensin cluster on 8p)
An exciting area
It is clear from this brief overview, that copy number variation studies offer an important insight into the genetic basis of neurological disorders. Therefore tools that can improve the generation and interpretation of CNV data are essential. By working with experts, such as Jonathon Sebat and David Ledbetter, OGT’s CytoSure and Genefficiency offerings have been developed to take CNV studies to the next level. The CytoSure aCGH arrays are designed to provide excellent consistency and accuracy, enabling better aberration calling, and are available in ISCA-approved formats. As a result, they not only provide superior data for each individual research project, but also ensure that the data can be added to the global data-pool with ease.
The increased sample throughput combined with the increasing resolution of arrays, is making interpretation of results more complex. This is where the CytoSure Interpret Software becomes essential, offering faster and easier translation of oligo aCGH data into meaningful results, with highly flexible workflows and in-depth contextual information.
Large CNV studies require the use of high-throughput array processing, a capability that is often not available within a research institute. The OGT Genefficiency facility has been designed exactly for such studies, providing high-quality high-throughput aCGH processing, capable of consistently completing over 2000 samples per week. The service was recently used by the WTCCC in the largest CNV study ever conducted1 with excellent results achieved within a challenging timeline. At the heart of Genefficiency, is a proprietary laboratory information management system (LIMS), which ensures that each sample is processed correctly and meets all of the >40 QC check points. The LIMS will not let a sample continue if it fails at any of the QC points, and importantly identifies the exact reason for a failure, be it a faulty heating block or a purification error, enabling rapid intervention and minimal downtown (Figure 2).
Figure 2: Derivative log ratio spread (DLRS) of control samples run on more than 400 plates. This metric is a measure of the reliability of the data to detect and call CNV aberrations. Data illustrated is from 19 weeks of a high throughput CNV project showing a 98 % pass rate of control samples run on every plate. Where there is deviation from excellent data, the causative piece of equipment or consumable can be rapidly identified using OGT’s bespoke LIMS and removed from the processing, ensuring an immediate return to high quality data.
The OGT Genefficiency facility is also the first such service to be certified a High-Throughput Certified Service Provider (HT CSP) for Agilent microarrays. This new level of the Agilent certification provides official validation of OGT’s use of Agilent microarrays in a high-throughput environment and reflects OGT’s status as the supplier of choice for large-scale outsourced microarray studies.
- Conrad D F. et al. Origins and functional impact of copy number variation in the human genome. Nature 2010; 464: 704-712.