Human Molecular Genetics Advance Access originally published online on August 28, 2007
Human Molecular Genetics 2007 16(22):2770-2779; doi:10.1093/hmg/ddm234
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Optimal design of oligonucleotide microarrays for measurement of DNA copy-number
1 Department of Genome Sciences, University of Washington School of Medicine, 2 Howard Hughes Medical Institute, 1705 NE Pacific St., Seattle, WA 98195, USA and 3 Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
* To whom correspondence should be addressed at: Department of Genome Sciences, University of Washington and Howard Hughes Medical Institute, Foege Building S413A, Box 355065, 1705 NE Pacific St., Seattle, WA 98195, USA. Tel: +1 2065439526; Fax: +1 2066857301; Email: eee{at}gs.washington.edu
Received June 18, 2007; Revised August 17, 2007; Accepted August 17, 2007
| ABSTRACT |
|---|
|
|
|---|
Copy-number variants (CNVs) occur frequently within the human genome, and may be associated with many human phenotypes. If disease association studies of CNVs are to be performed routinely, it is essential that the copy-number status be accurately genotyped. We systematically assessed the dynamic range response of an oligonucleotide microarray platform to accurately predict copy-number in a set of seven patients who had previously been shown to carry between 1 and 6 copies of an
4 Mb region of 15q12.2–q13.1. We identify probe uniqueness, probe length, uniformity of probe melting temperature, overlap with SNPs and common repeats (particularly Alu elements) and guanine homopolymer content as parameters that significantly affect probe performance. Further, we prove the influence of these criteria on array performance by using these parameters to prospectively filter data from a second array design covering an independent genomic region and observing significant improvements in data quality. The informed selection of probes which have superior performance characteristics allows the prospective design of oligonucleotide arrays which show increased sensitivity and specificity compared with current designs. Although based on the analysis of data from comparative genomic hybridization experiments, we anticipate that our results are relevant to the design of improved oligonucleotide arrays for high-throughput copy-number genotyping of complex regions of the human genome. | INTRODUCTION |
|---|
|
|
|---|
DNA microarrays composed of oligonucleotides attached to a solid substrate are emerging as a key tool in genetic research. Initially developed for the measurement of gene expression of multiple targets in a single experiment, array-based comparative genomic hybridization (CGH) is increasingly being used as a high-throughput method to measure DNA copy-number genome-wide.
There is emerging evidence that copy-number variations are important risk factors for human disease (1,2). Future genetic association studies will probably require genotyping platforms that not only detect single copy gains and losses, but are able to accurately predict copy-number of targets which show more complex variation. Although oligonucleotide microarrays have the potential to yield quantitative measurements necessary for the accurate genotyping of this type of variation, it is recognized that these platforms often suffer from significant amounts of experimental noise, and that signals from multiple independent probes are generally required to generate reliable data.
Many previous studies have examined the behaviour of short (25mer) mismatched oligonucleotides for SNP genotyping (3–5), leading to significant improvements in this technology. However, little work has been done to assess factors influencing the performance of arrays composed of longer oligonucleotides (length 45–85 bp). The use of such arrays is becoming increasingly widespread for the analysis of genomic copy-number variation (6,7), gene expression (8), DNA methylation (9) and the mapping of DNAseI hypersensitivity sites (10), chromatin modifications (11) and DNA binding proteins (12).
Previous efforts to define sequence characteristics which can be used to predict probe performance have been unsuccessful, but were limited by the small sample size (
1000 probes) and study design (13). Here we utilize a cohort of well-characterized disease patients with known copy-number for a large region of proximal chromosome 15 as a method of assessing individual probe performance. In each of the seven aneuploid patients tested, previous cytogenetic, FISH and BAC array CGH analyses have shown the presence of between 1 and 6 copies of the region 15q12.2–q13.1 (14, S. Schwartz, unpublished data). Utilizing a dense tiling design consisting of more than 100,000 oligonucleotide probes within this region, we show that the ability of individual probes to accurately report underlying copy-number is highly variable and dependent upon a distinct set of sequence and physical properties. We further demonstrate that utilization of this knowledge enables the definition of a significantly improved probe set for the measurement of DNA copy-number at an independent locus. Based on our analysis, we propose a set of parameters for the design of optimized oligonucleotide microarrays which show increased signal and reduced noise characteristics compared with current array designs. These criteria may be used for the future development of genotyping assays that accurately assess the copy-number of specific genomic loci.
| RESULTS |
|---|
|
|
|---|
We designed a customized oligonucleotide microarray (mean density, 1 olignucleotide/40 bp) targeted to the proximal portion of human chromosome 15q11–q14. The region is one of the most genetically unstable regions of the human genome. Recurrent microdeletions are associated with Prader-Willi/Angelman syndrome, duplications associated with autism, supernumerary marker chromosomes and a variety of other genomic disorders (15–18). We selected seven DNA samples of known copy-number ranging from 1 to 6 over a large region of
5 Mb that we had characterized previously using BAC array CGH (14) (Fig. 1A). This region of 15q shows wide variations in GC-content, repeat content and segmental duplications, allowing various genomic landscapes and sequence properties to be assessed. To ascertain the dynamic response of the oligonucleotide array to changes in DNA copy-number, we initially calculated the mean log2 ratio of all unique probes that were contained in the common rearrangement region (Fig. 1B, Supplementary Material, Table S1). While the amplitude of log2 ratios fell far short of the theoretical prediction (i.e. –1 for a haploid deletion, +0.58 for a duplication and +1 for a triplication), mean log2 ratios were significantly correlated with underlying copy-number (R2 = 0.9732), indicating that the overall array data accurately reflect changes in the input DNA.
|
Individual probe performance is highly variable
Although the mean amplitude for all probes within the variant region correlated well with known copy-number, the variance around this mean was relatively large in each hybridization (Fig. 1B, Supplementary Material, Table S1). In order to measure the performance of individual probes on the array, we calculated the Pearson correlation coefficient (r) between the log2 ratio and known DNA copy-number for each of the 91 069 unique-sequence probes in the common minimal region of 15q12.2–q13.1 (chr15:21224542–25623430). This distribution of probe performance is shown in Figure 2. While some probes report a log2 ratio that is well correlated with underlying DNA copy-number (18.2% of probes have r > 0.8), a similar proportion yields data that does not correlate or, in fact, correlated negatively with copy-number (17.2% of probes show r < 0.5), essentially yielding no informative data. The mean value of r for all 91 069 probes was 0.687.
|
As the performance of individual oligonucleotide probes was highly variable, we hypothesized that this difference may be a function of the sequence or physical properties of each probe. We set out to test these hypotheses by stratifying probes and dividing into quartiles based on their strength of correlation between log2 ratio and copy-number in our series of patients. Our goal was to partition the data and systematically identify sequence properties that distinguish dynamic-range responsive probes.
Probe uniqueness
It has been previously demonstrated that non-unique probes show a reduction in specificity and sensitivity with increasing number of hybridization targets (14). Consequently, such regions are frequently excluded as part of the routine copy-number detection schemes. In this design, we specifically targeted 10% of our probes (9444/101 013) to duplicated regions. We measured the genome representation of each copy as the number of near-perfect match occurrences of each probe within the human genome (hg17), termed close-match frequency (CMF). Interestingly, probes with a CMF value
5 showed a good correlation with expected copy-number (Fig. 3), suggesting that such regions could be informative for copy-number variation studies. However, when CMF exceeded 5, correlation rapidly deteriorated. Overall, our results show that probe performance is inversely correlated with uniqueness (R2 = 0.603), indicating that, as expected, probes which map to unique genomic locations are more informative than probes which hybridize to multiple loci. Based on this result and due to the imprecision of breakpoints which are predicted to occur within the duplications, we limited all subsequent analyses of probe parameters to the 91 069 probes with a CMF = 1.
|
We used RepeatMasker to assess the repeat content of our probe set. After dividing probes in quartiles, total repeat content was almost perfectly inversely correlated with probe performance (R2 = 0.999, Fig. 4B). Probes in the bottom quartile had a mean repeat content of 44.0% while those in the top quartile had a mean repeat content of 33.7%. Interspersed repeats account for
45% of the human genome (19), suggesting that probes that are relatively deficient in common repeats compared with the genome average give superior performance. We then assessed distribution of each repeat class in the quartiles. ERVK and Alu elements showed the strongest association with reduced probe performance, and were enriched 7.2-fold and 5.4-fold in the lower versus upper quartiles, respectively. In contrast, L2s were associated with increased performance, showing a relative 1.9-fold enrichment in the upper versus lower quartile (Supplementary Material, Table S2).
|
SNP content
The presence of underlying sequence variants at probe binding sites is one factor that can reduce the efficiency of DNA hybridization. We used data from the HapMap (IHC, 2004) to estimate the frequency of SNPs in our probe set, shown in Figure 4A. Using data from all known SNPs, there is strong inverse correlation between probe performance and both total SNP content (R2 = 0.983) and the abundance of common SNPs (defined here as SNPs
10% minor allele frequency, R2 = 0.974). The presence of common SNPs in a probe sequence was a strong predictor of poor probe performance, showing a 1.9-fold enrichment in the bottom versus top quartile probe sets. At probe lengths greater than 55, there were no significant differences (two sample t-test with Bonferroni correction, data not shown) in performance between probes with and without common SNPs, suggesting that the effect of common SNPs on probe performance is attenuated by increased probe length (Supplementary Material, Fig. S1). These data are consistent with the presence of sequence polymorphisms at probe binding sites significantly affecting hybridization kinetics.
Probe length
In order to maintain an approximately isothermal design over regions of varying GC content, probes in our array design ranged in length from 45 to 75 bp. We plotted the relationship between probe length and performance, relative to the bottom quartile (Fig. 5). There was a strong correlation between probe length and performance for all probe lengths. The shortest probes on our array (length 45 bp, corresponding to the minimum length threshold in the design) show a 1.9-fold enrichment in the bottom quartile of probes relative to the top quartile. In contrast, probes of
46 bp show progressively increasing enrichment in the upper quartiles, with probes of length
55 bp showing an average 4.2-fold enrichment in the top versus bottom quartiles. Comparing mean probe length in the top and bottom 10% tails of the distribution clarifies the relationship with probe length. While the most informative probes have a mean length of 51.0 bp, this drops to 46.7 bp for the least informative probes. These data strongly indicate that probe performance increases with length.
|
Probe melting temperature
Using theoretical calculations of probe melting temperature (Tm), we plotted the relationship between probe Tm and performance for the upper and lower deciles (Fig. 6). Although probes were selected to an approximately isothermal design, there is significant probe-to-probe variation in Tm, ranging from
69 to 83°. Significantly, the most informative probes show increased uniformity in the distribution of Tm values. Eighty seven percent of the most informative probes have a Tm in the range 68–71°, compared with only 54% of the least informative probes, which instead show a distribution skewed towards higher melting temperatures. These data suggest that array designs with more uniform thermal hybridization profiles more accurately predict copy-number.
|
Homopolymer content
We investigated the nucleotide content of probes in the upper and lower deciles of our probe set (Fig. 7). Overall, GC content showed a bias between the most and least informative probes, with 45.2% GC nucleotides in the upper 10% tail of the distribution versus 52.5% GC in the bottom decile. Homopolymer content showed an even stronger bias. Guanine homopolymers were significantly enriched in the least informative probe set, with the motifs GGGGG and GGGGGG occurring at >20-fold increased frequency in comparison to the most informative probes. We observed no significant positional bias of these polyG motifs within probe sequences (data not shown). In comparison, polyC motifs showed a weaker effect (up to 2.5-fold enrichment for CCCCC in bottom versus top quartile of probes) while adenine and thymine homopolymers showed only small differences between the most and least informative probes (<1.6-fold difference for all polymers of A and T). These data indicate that the presence of extended polyG motifs in probe sequences significantly reduces their performance.
|
Despite the influence of homopolymers on probe performance, we found no evidence that overall probe sequence complexity influenced performance. We tested overall sequence complexity of the entire set of probes in the upper 10% and lower 10% tails using two different standard data compression algorithms. As the extent of data compression of a text file containing these probe sets is a function of the complexity of the file content, data compressibility can be used as a measure of the overall sequence complexity of a probe set. Both algorithms used returned file compression ratios which were almost identical for the upper and lower 10% tails, indicating that there was little or no correlation between sequence complexity and probe performance (compression ratio for upper:lower 10% tail was 0.999 using WinZip and 1.019 using 7-Zip).
Covariance between variables
To assess the interdependence of variables, pairwise correlations were calculated between the presence of a GGGGG motif, presence of common SNPs, overlap with repeats, probe length and GC content. Tm is dependent on probe length and GC content and so was excluded from the analysis. The results are summarized in Supplementary Material, Table S3. For probes containing SNPs or repeats, GC content was on average lower and probe length was greater demonstrating that the presence of common SNPs or overlap with repeating elements are independent predictors of poorer probe performance. GC content and probe length were found to be negatively correlated, consistent with the fact that the oligonucleotide microarray used was designed to be isothermal (see Materials and Methods). The presence of a GGGGG motif was found to be negatively correlated with probe length, suggesting possible confounding effects. However, the effect of a GGGGG motif was still seen after stratification of probes by probe length, suggesting an effect independent of decreased probe length (Supplementary Material, Fig. S2). Despite the positive correlation between the presence of a GGGGG motif and increased GC content, a GGGGG motif likely has an independent effect on probe performance (see Discussion).
A prospective study
As a method of testing the ability of the above criteria to prospectively enrich for probes with improved performance characteristics, we utilized data from a second high-density oligonucleotide array and assessed the ability of each parameter to predict increased probe performance. This independent design included 27 275 probes covering a 1.375 Mb region of 17q12 (chr17: 31890000–33265000). Six patients with validated copy-number differences over this entire region were hybridized to the array, and mean log2 ratios for patients with 1, 2 and 3 copies were used to calculate the Pearson correlation coefficient (r) for each probe. Analysis of common repeats showed that, as had been observed for 15q, Alu content was significantly correlated with reduced probe performance, while L2s were associated with increased performance (Supplementary Material, Table S4).
We filtered this probe set to exclude probes with putative characteristics associated with reduced performance, identified from our study of 15q. Results demonstrate that nearly all probe selection criteria identified from our initial analysis resulted in significant improvements in array data quality. With the exception of the removal of probes overlapping SNPs, all other probe filters applied to the 27 275 probes in 17q12 resulted in increased mean correlation coefficients between log2 ratio and copy-number, and/or increased dynamic response compared with the unfiltered probe set (Table 1). The most extreme single increase in array performance resulted from the removal of shorter probes, length <55 bp (mean r increased from 0.735 to 0.829), but this filtering also resulted in the loss of more than 80% of all data points. However, the use of combinations of less stringent probe filters was able to improve performance even further while retaining better density.
|
| DISCUSSION |
|---|
|
|
|---|
Our analysis of individuals with pre-defined copy-number of large chromosomal regions using thousands of oligonucleotide probes has defined, in part, parameters which significantly influence microarray data quality. Further, we show that the use of these simple sequence-based criteria can prospectively select a probe set which shows superior performance characteristics for the measurement of DNA copy-number. Our results allow the design of oligonucleotide arrays with increased sensitivity and specificity compared with current designs.
Both our data and that of previous studies suggest that the single most important variable dictating array performance is probe length. He et al. (20) and Ramdas et al. (21) showed that for oligonucleotide expression arrays, signal intensity increases as a function of probe length, with an average of
20-fold increase in sensitivity for 70mers compared with 50mers. It has been reported that the optimal probe length for expression arrays is
150 bp (22), but such lengths cannot be reliably achieved with current synthesis technologies. The major limiting factor dictating probe length is the efficiency of photolithographic process used to synthesize probes in situ (23). Improvements in this technology have already allowed the synthesis of arrays composed of longer oligonucleotides (24), and current designs manufactured by NimbleGen utilize probes of length 50–85 bp compared with 45–75 bp probes used in this study.
It is clear that several of the probe parameters are not independent. Melting temperature is a predictor of probe performance that depends directly on GC content and probe length (see Materials and Methods). For probes to be isothermal in GC-rich regions, probe length must predictably decrease or probe performance suffers. This appears to be what was observed on the designs that we have utilized in this study. Probes with Tm significantly greater (e.g. >5°) than the array average were all found to be GC-rich 45mers. These represent probes whose optimum length on an isothermal array is <45 bp, but which have been extended in length to reach the minimum threshold requirement of the design (Fig. 8). While simply excluding such probes is one solution to improve quality performance, we suggest the relaxation of this minimum length threshold in favor of maintaining a constant probe Tm as an alternative solution that would avoid large gaps in probe coverage in GC-rich regions.
|
The presence of polyG motifs also appears to be a significant predictor of poor probe performance. Although there is also a significant increase in GC content in the worst performing probes, this alone does not explain the polyG enrichment that we observe. If this was simply a function of increased GC content, a concomitant decrease in polyA and polyT motifs would also be expected. No such bias was observed, indicating that polyG motifs, and to a lesser extent polyC, are a significant correlate of reduced probe performance. There are two possibilities which could account for this association of polyG tracts with poor probe performance: (i) reduced synthesis efficiency of polyG motifs during array manufacture such that errors are introduced into the synthesized probe sequence or (ii) reduced hybridization performance of probes containing polyG motifs. Both explanations are consistent with previous anecdotal reports of reduced probe performance of short oligonucleotides containing polyG and polyC motifs (4,5).
Conformational studies have also shown that single-stranded guanine tetranucleotide motifs are capable of forming hydrogen-bonded quaternary structures with neighboring DNA molecules, termed G quartets (G4 DNA) or polyG stacks (25). The presence of this type of probe–probe interaction would likely significantly affect both probe synthesis efficiency and/or hybridization of probes to their target sequences, accounting for their decreased performance on the array (Fig. 7). Although still deleterious, it is noteworthy that probes containing polyC motifs exhibit a milder decrease in performance compared with those containing polyGs (Fig. 7). In situations where probes are sited in guanine-rich regions, our results suggest that simply switching to the use of a reverse-complement cytosine-rich probe which binds to the alternate strand may significantly improve probe performance at these loci. Improvements in probe performance with the use of strand-switched probes have been observed previously, but the underlying mechanism was unclear (26). Our data suggest that differences in poly-guanine content likely underlie this phenomenon. Approximately 3% of all probes on our arrays contained at least one GGGGG motif. Our data suggest that exclusion of these sites would significantly improve data quality without adversely affecting probe coverage.
Only a small fraction of probes overlap common SNPs, and exclusion of this subset would also likely lead to improvements in array data quality. Although the probe selection algorithm excludes the placement of probes within high-frequency motifs (see Materials and Methods), a significant fraction of probes on our array still overlap common repeat sequences. While exclusion of all repeats is one option for improving data, our results show that specifically overlapping Alu repeats is a strong predictor of reduced probe performance, and exclusion of these sites improved array data while reducing coverage by <3%.
It is noteworthy that, because of our preliminary data, the second array covering 17q12 that we tested excluded probes containing the motif GGGGG and probes with abnormally high Tm from the design. Consistent with our data, this 17q12 array yielded both increased probe performance (mean value of r for the 17q12 design was 0.735 compared with 0.709 for the 15q design, when patients with 1, 2 or 3 copies of this locus are considered), and increased dynamic response (mean log2 amplitude was –0.437 compared with –0.363 for deletions, and +0.214 compared with +0.133 for duplications on the 17q12 and 15q designs, respectively). In contrast to our results from the 15q array, removing probes which overlapped SNPs in the 17q12 design did not improve the data quality. However, this probably is a result of the very low SNP content of the probes in this region, with only 35 of the 27275 17q12 probes (0.13%) overlapping common SNPs.
The use of probe selection filters has the drawback of potentially excluding certain genomic regions from being represented on an array. As a result, the utility of applying different probe exclusion criteria will be dependent on the density requirements of an individual microarray design. We anticipate that our results will be most beneficial where greater flexibility in probe placement can be tolerated. For designs in which an entire genome is covered with widely spaced probes, stringent filtering parameters which select an optimum probe set at the expense of coverage could be applied (as shown Table 1). However, even in high-density-targeted array designs, the use of less stringent probe filtering parameters can still lead to significant improvements in data quality with minimal loss in coverage. For example, simply excluding probes containing guanine pentamer motifs and probes with very high Tm allowed a consistently high-density design (mean density across chr17: 31890000–33265000, 1 probe per 50 bp, with <0.2% of probes separated by >1 kb intervening sequence), which showed significantly improved data quality compared with our initial naïve design of 15q.
We propose that the ability to accurately assess copy-number at specific genomic loci will be crucial for the success of future genetic studies. To this end, our results define a set of criteria that can be used for the development of improved array-based genotyping assays which yield increased data quality.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Patient description and array design
The six patients with rearrangements of chromosome 15 used in this study had been characterized previously using both FISH (S. Schwartz, unpublished data) and BAC array CGH (14) to determine copy-number in the region 15q12.2–q13.1. For analysis of these patients, we utilized a custom oligonucleotide array designed against the NCBI Build 35/May 2004 assembly (NimbleGen Systems, Madison WI), comprising 348 704 oligos covering 14 070 933 bp in 15q11.2–q14 (chr15:19099848–33170780). This yielded a mean density of 1 oligo every 40 bp over this region. Although the design included an additional 39 297 probes at other loci throughout the genome, only probes contained within 15q11.2–q14 were considered for further analysis. Probe placement and design utilized proprietary software (NimbleGen Systems, Madison WI). Probe length varied from 45–75 bp to yield an approximately isothermal array design with a mean Tm of 76°C. Potential probes were excluded based on the following criteria: first, a 15 bp sliding window analysis was performed at 1 bp increments throughout the entire genome. At each window position, the number of perfect matches to other genomic loci was calculated. For every probe sequence, the mean number of genomic 15mer matches was calculated, and probes with a mean score >100 were excluded from the design; secondly, each probe was assigned a uniqueness score. This score, termed the CMF, was defined as the number of locations in the genome which match the probe sequence allowing for
5 bp of insertion, deletion or substitution between probe and target. Any probes with a CMF >10 were excluded from the array design.
Array hybridization and data analysis
All hybridizations were performed as described previously (27) against DNA isolated from lymphoblastoid cells derived from a single normal female individual (NA15510, Coriell, Camden NJ) used as reference. The reference has been well characterized in previous studies (28).
For the analysis of probe performance, a total set of 101 013 probes were initially considered in the common minimal 15q11.2–q12 region that was rearranged in the seven individuals studied (chr15:21224542–25623430, removing probes which overlapped a region of copy-number polymorphism between these cases at chr15:22963527–23050460). This set was then reduced to 91 069 probes by removing all probes with a CMF>1 (those with multiple hybridization targets).
The log2 ratio here is the base 2 logarithm of the ratio of experimental-to-reference signals obtained from array hybridization. The log2 ratios were calculated from signal data and subsequently normalized using qspline normalization as described by Workman et al.(29). We report linear correlations (Pearson correlation coefficient) throughout. For our results, linear and logarithmic correlation coefficients were found to be similar measures of probe performance. For probes in the bottom quartile of linear correlation, 86% were in the bottom quartile of logarithmic correlation and all were below the median value of logarithmic correlation. For probes in the top quartile of linear correlation, 70% were in the top quartile of logarithmic correlation and 98% were above the median value of logarithmic correlation. Because of the high content (
30%) of probes of abnormal copy-number in these hybridizations, normalization created an artifact whereby the log2 ratios of all probes in each data set were significantly shifted from normality. In order to correct for this artifact, we adjusted the log2 ratios in each hybridization by re-normalizing against a region known to be invariant in copy-number in all seven cases, as follows: We calculated the mean log2 ratio for all 45 632 probes contained in the region chr15:30688005–32457939 in each hybridization. The log2 ratio for all probes on the array was then adjusted by this differential for each respective hybridization.
Prospective testing of probe selection criteria
Data from a second custom NimbleGen array design were used for the prospective testing of probe selection criteria. This design included multiple genomic regions, covering a total of 20.6 Mb of sequence, mean density 1 probe per 53 bp. In addition to the design parameters stated earlier, this second array also excluded: (i) probes containing the motif GGGGG and (ii) probes with Tm >85° (as defined by NimbleGen's Tm calculation). We utilized data from six individuals who had been shown by previous molecular studies (FISH, BAC array CGH, microsatellite analysis and/or qPCR) to possess 1 (n = 2), 2 (n = 2) or 3 copies (n = 2) of a region of 17q12 (7,30,31). The common minimal region that was rearranged in all the six individuals studied covered 1.375 Mb (chr17: 31890000–33265000), comprising 27 275 independent oligonucleotide probes. Mean log2 ratios for patients with 1, 2 and 3 copies were used to calculate the Pearson correlation coefficient (r) for each probe.
Repeat and SNP content analysis
Probe repeat content was measured by performing an overlap with the RepeatMasker track (http://genome.ucsc.edu/). The SNP content of each probe was measured (CEU population, data release 20/ phaseII/January2006, International HapMap Project, http://www.hapmap.org/cgi-perl/gbrowse/hapmap20_B35/) and the mean SNP content of each probe quartile calculated by normalizing against total bp of probe sequence.
Tm calculation
Probe melting temperature (Tm) was calculated using the formula: Tm = 64.9+41*(yG+zC-16.4)/(wA+xT+yG+zC) = 64.9+41*(GC content–16.4/Probe length), where w, x, y, z are the number of the bases A,T,G,C in the sequence, respectively (http://www.basic.northwestern.edu/biotools/oligocalc.html)
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| FUNDING |
|---|
|
|
|---|
NIH grants HD043569 and HG004120, and a fellowship from Merck Research Laboratories to A.J.S.
| ACKNOWLEDGEMENTS |
|---|
E.E.E. is an Investigator of the Howard Hughes Medical Institute.
Conflict of Interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
-
Gonzalez E., Kulkarni H., Bolivar H., Mangano A., Sanchez R., Catano G., Nibbs R.J., Freedman B.I., Quinones M.P., Bamshad M.J., et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science (2005) 307:1434–1440.
[Abstract/Free Full Text] - Yang Y., Chung E.K., Wu Y.L., Savelli S.L., Nagaraja H.N., Zhou B., Hebert M., Jones K.N., Shu Y., Kitzmiller K., et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet. (2007) 80:1037–1054.[CrossRef][Web of Science][Medline]
-
Kane M.D., Jatkoe T.A., Stumpf C.R., Lu J., Thomas J.D., Madore S.J. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. (2000) 28:4552–4557.
[Abstract/Free Full Text] -
Zhang L., Wu C., Carta R., Zhao H. Free energy of DNA duplex formation on short oligonucleotide microarrays. Nucleic Acids Res. (2007) 35:e18.
[Abstract/Free Full Text] -
Mei R., Hubbell E., Bekiranov S., Mittmann M., Christians F.C., Shen M.M., Lu G., Fang J., Liu W.M., Ryder T., et al. Probe selection for high-density oligonucleotide arrays. Proc. Natl Acad. Sci. USA (2003) 100:11237–11242.
[Abstract/Free Full Text] -
Barrett M.T., Scheffer A., Ben-Dor A., Sampas N., Lipson D., Kincaid R., Tsang P., Curry B., Baird K., Meltzer P.S., et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc. Natl Acad. Sci. USA (2004) 101:17765–17770.
[Abstract/Free Full Text] - Sharp A.J., Hansen S., Selzer R., Cheng Z., Regan R., Hurst J.A., Blair E., Hennekam R.C., Fitzpatrick C.A., Segraves R., et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nature Genet. (2006) 38:1038–1042.[CrossRef][Web of Science][Medline]
-
Bertone P., Stolc V., Royce T.E., Rozowsky J.S., Urban A.E., Zhu X., Rinn J.L., Tongprasit W., Samanta M., Weissman S. Global identification of human transcribed sequences with genome tiling arrays. Science (2004) 306:2242–2246.
[Abstract/Free Full Text] - Zilberman D., Gehring M., Tran R.K., Ballinger T., Henikoff S. Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat. Genet. (2007) 39:61–69.[CrossRef][Web of Science][Medline]
- Sabo P.J., Kuehn M.S., Thurman R., Johnson B.E., Johnson E.M., Cao H., Yu M., Rosenzweig E., Goldy J., Haydock A., et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat. Methods (2006) 3:511–518.[CrossRef][Web of Science][Medline]
- Mito Y., Henikoff J.G., Henikoff S. Genome-scale profiling of histone H3.3 replacement patterns. Nat. Genet. (2005) 37:1090–1097.[CrossRef][Web of Science][Medline]
- Kim T.H., Barrera L.O., Zheng M., Qu C., Singer M.A., Richmond T.A., Wu Y., Green R.D., Ren B. A high-resolution map of active promoters in the human genome. Nature (2005) 436:876–880.[CrossRef][Medline]
- Leiske D.L., Karimpour-Fard A., Hume P.S., Fairbanks B.D., Gill R.T. A comparison of alternative 60-mer probe designs in an in-situ synthesized oligonucleotide microarray. BMC Genomics (2006) 7:72.[CrossRef][Medline]
-
Locke D.P., Segraves R., Nicholls R.D., Schwartz S., Pinkel D., Albertson D.G., Eichler E.E. BAC microarray analysis of 15q11–q13 rearrangements and the impact of segmental duplications. J. Med. Genet. (2004) 41:175–182.
[Abstract/Free Full Text] - Amos-Landgraf J.M., Ji Y., Gottlieb W., Depinet T., Wandstrat A.E., Cassidy S.B., Driscoll D.J., Rogan P.K., Schwartz S., Nicholls R.D. Chromosome breakage in the Prader-Willi and Angelman syndromes involves recombination between large, transcribed repeats at proximal and distal breakpoints. Am. J. Hum. Genet. (1999) 65:370–386.[CrossRef][Web of Science][Medline]
- Crolla J.A., Harvey J.F., Sitch F.L., Dennis N.R. Supernumerary marker 15 chromosomes: a clinical, molecular and FISH approach to diagnosis and prognosis. Hum. Genet. (1995) 95:161–170.[CrossRef][Web of Science][Medline]
- Roberts S.E., Dennis N.R., Browne C.E., Willatt L., Woods G., Cross I., Jacobs P.A., Thomas S. Characterisation of interstitial duplications and triplications of chromosome 15q11–q13. Hum. Genet. (2002) 110:227–234.[CrossRef][Web of Science][Medline]
-
Schinzel A.A., Brecevic L., Bernasconi F., Binkert F., Berthet F., Wuilloud A., Robinson W.P. Intrachromosomal triplication of 15q11–q13. J. Med. Genet. (1994) 31:798–803.
[Abstract/Free Full Text] - IHGSC (International Human Genome Sequencing Consortium). Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
-
He Z., Wu L., Fields M.W., Zhou J. Use of microarrays with different probe sizes for monitoring gene expression. Appl. Environ. Microbiol. (2005) 71:5154–5162.
[Abstract/Free Full Text] - Ramdas L., Cogdell D.E., Jia J.Y., Taylor E.E., Dunmire V.R., Hu L., Hamilton S.R., Zhang W. Improving signal intensities for genes with low-expression on oligonucleotide microarrays. BMC Genomics (2004) 5:35.[CrossRef][Medline]
-
Chou C.C., Chen C.H., Lee T.T., Peck K. Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. Nucleic Acids Res. (2004) 32:e99.
[Abstract/Free Full Text] - Singh-Gasson S., Green R.D., Yue Y., Nelson C., Blattner F., Sussman M.R., Cerrina F. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat. Biotechnol. (1999) 17:974–978.[CrossRef][Web of Science][Medline]
- Woll D., Walbert S., Stengele K.P., Green R., Albert T., Pfleiderer W., Steiner U.E. More efficient photolithographic synthesis of DNA-chips by photosensitization. Nucleosides Nucleotides Nucleic Acids (2003) 22:1395–1398.[CrossRef][Web of Science][Medline]
- Poon K., Macgregor R.B. Unusual behavior exhibited by multistranded guanine-rich DNA complexes. Biopolymers (1998) 45:427–434.[CrossRef][Web of Science][Medline]
- Baldocchi R.A., Glynne R.J., Chin K., Kowbel D., Collins C., Mack D.H., Gray J.W. Design considerations for array CGH to oligonucleotide arrays. Cytometry A (2005) 67:129–136.[Medline]
- Selzer R.R., Richmond T.A., Pofahl N.J., Green R.D., Eis P.S., Nair P., Brothman A.R., Stallings R.L. Analysis of chromosome breakpoints in neuroblastoma at sub-kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer (2005) 44:305–319.[CrossRef][Web of Science][Medline]
- Tuzun E., Sharp A.J., Bailey J.A., Kaul R., Morrison V.A., Pertz L.M., Haugen E., Hayden H., Albertson D., Pinkel D. Fine-scale structural variation of the human genome. Nature Genet. (2005) 37:727–732.[CrossRef][Web of Science][Medline]
- Workman C., Jensen L.J., Jarmer H., Berka R., Gautier L., Nielser H.B., Saxild H.H., Nielsen C., Brunak S., Knudsen S. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. (2002) 3. research0048.1–research0048.16.
-
Bellanne-Chantelot C., Clauin S., Chauveau D., Collin P., Daumont M., Douillard C., Dubois-Laforgue D., Dusselier L., Gautier J.F., Jadoul M., et al. Large genomic rearrangements in the hepatocyte nuclear factor-1beta (TCF2) gene are the most frequent cause of maturity-onset diabetes of the young type 5. Diabetes (2005) 54:3126–3132.
[Abstract/Free Full Text] -
Mefford H.C., Clauin S., Sharp A.J., Moller R.S., Ullmann R., Kapur R., Pinkel D., Cooper G.M., Ventura M., Hilger-Ropers H., Tommerup N., Eichler E.E., Bellannc-Chantelot C. Recurrent reciprocal genomic rearrangements of 17q12 are associated with renal disease, diabetes and epilepsy. Am. J. Hum. Genet. (in press).
This article has been cited by other articles:
![]() |
J. S. Maydan, H. M. Okada, S. Flibotte, M. L. Edgley, and D. G. Moerman De Novo Identification of Single Nucleotide Mutations in Caenorhabditis elegans Using Array Comparative Genomic Hybridization Genetics, April 1, 2009; 181(4): 1673 - 1677. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-K. Chen, J. D. Swartz, L. J. Rush, and C. E. Alvarez Mapping DNA structural variation in dogs Genome Res., March 1, 2009; 19(3): 500 - 509. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









