Human Molecular Genetics Advance Access originally published online on July 23, 2009
Human Molecular Genetics 2009 18(20):3864-3875; doi:10.1093/hmg/ddp330
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Expression differences by continent of origin point to the immortalization process
Adam R. Davis1,* and
Isaac S. Kohane1,2
1 i2b2 National Center for Biomedical Computing, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA and
2 Harvard Medical School Center for Biomedical Informatics, Boston, MA, USA
* To whom correspondence should be addressed at: i2b2 National Center for Biomedical Computing, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. Tel: +1 6173552933; E-mail: ardavis{at}partners.org
Received April 15, 2009; Revised June 15, 2009; Accepted July 16, 2009
 |
ABSTRACT
|
|---|
Analysis of recently available microarray expression data sets
obtained from immortalized cell lines of the individuals represented
in the HapMap project have led to inconclusive comparisons across
cohorts with different ancestral continent of origin (ACOO).
To address this apparent inconsistency, we applied a novel approach
to accentuate population-specific gene expression signatures
for the CEU [homogeneous US residents with northern and western
European ancestry (HapMap samples)] and YRI [homogenous Yoruba
people of Ibadan, Nigeria (HapMap samples)] trios. In this report,
we describe how four independent data sets point to the differential
expression across ACOO of gene networks implicated in transforming
the normal lymphoblast into immortalized lymphoblastoid cells.
In particular, Werner syndrome helicase and related genes are
differentially expressed between the YRI and CEU cohorts. We
further demonstrate that these differences correlate with viral
titer and that both the titer and expression differences are
associated with ACOO. We use the 14 genes most differentially
expressed to construct an ACOO-specific immortalization
network comprised of 40 genes, one of which show significant
correlation with genomic variation (eQTL). The extent to which
these measured group differences are due to differences in the
immortalization procedures used for each group or reflect ACOO-specific
biological differences remains to be determined. That the ACOO
group differences in gene expression patterns may depend strongly
on the process of transforming cells to establish immortalized
lines should be considered in such comparisons.
 |
INTRODUCTION
|
|---|
Several recent studies of populations of different ancestral
continent of origin (ACOO) have identified ACOO-specific gene
expression differences. Because the sets of genes identified
in these studies are largely non-overlapping, the biological
interpretation of these results is challenging (
1–
6).
Given the importance to health disparities of such studies,
we have undertaken an integrative approach to determine whether
indeed there is a consistent difference. We have also added
a new study sample to further validate our findings. Cross-population
expression studies are fraught with the well-known variability
in the biology as well as the difficulties in comparing transcriptome-wide
measures from different platforms (
7,
8) and the increasingly
documented intrinsic biases of expression patterns of immortalized
cell lines (
6). Technical bias may affect many genes in concert,
thus causing spurious correlations in clinical data sets and
false associations between genes and clinical variables (
9).
The study of the transcriptome in groups with different ACOO
is particularly problematic in that most of these studies are
performed on Epstein–Barr virus (EBV) immortalized cell
lines. Specifically, the International HapMap Project harvested
peripheral blood lymphoblasts from the homogenous Yoruba tribe
from Ibadan Nigeria (YRI) and then transformed them into immortalized
cells
in vitro using the EBV. This is of potential additional
relevance, as the YRI population is one of the sub-Saharan populations
known to suffer from an endemic childhood cancer Burkitt lymphoma
(BL), caused by the EBV that environmentally saturates sub-Saharan
Africa (
10–
13). In contrast, the CEU [homogeneous US residents
with northern and western European ancestry (HapMap samples)]
population as well as other populations with European ancestry
has to date no reported predisposition or population-specific
susceptibility to EBV infection. This raises the question of
the degree to which the reported expression differences are
due to laboratory technique, measurement platform difference,
laboratory-specific variation in EBV-driven cell immortalization,
or COO-specific responses to EBV infection and immortalization.
To explore this question, we filtered samples and genes to accentuate
population stratification between CEU and YRI trios. Our guiding
principle was to select for samples and genes with the highest
consistency within ACOO and the least overlap across ACOO. Our
approach is outlined in Figure
1. We analyzed four independent
recent studies, three of which were conducted on immortalized
cell lines previously published (
5,
14,
15), to find the reproducible
differences by ACOO across two expression array platforms (Affymetrix
and Illumina), and a fourth analysis was performed on an expression
experiment of primary lymphoid cells from African Americans
(AAs) and Caucasians (CAs) (
16). Further description of the
experiments, type of array platforms and genes analyzed are
listed in
Supplementary Material, Table S1. To reduce noise
from the varied measurement platforms and laboratory-specific
technique, this analysis was intentionally driven to high specificity
at the cost of sensitivity (
9) by the filtering process, as
described. Our analysis identified an immortalization
network consisting of 40 genes, of which 24 genes are
differentially expressed between the CEU and YRI populations.
Furthermore, one of these genes, Werner syndrome helicase (WRN),
is significantly correlated with EBV titer. Subsequently, we
relaxed the original aggressive filtering of the data and found
the large majority of the immortalization network's genes were
differentially expressed across ACOO. Moreover, we identified
a
cis eQTL in gene POLR1A in the network with respect to ACOO.

View larger version (46K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 1. Analytic flow of the expression analysis of ACOO. Shaded boxes at the top represent independent data sets of gene expression profiling. The topmost three boxes are three experiments by different investigators on two expression profiling platforms measuring expression in the immortalized lymphoblasts of the YRI and CEU HapMap individuals. The fourth data set is measured on a group of children (CA and AA) who served as controls in an unrelated (autism) study. These cells in this population were not immortalized prior to measurement. Eighty probe sets were measured as significantly differentially expressed across the three immortalized cell data sets. Of those, 66 were also differentially expressed in non-immortalized data set and the subsequent analysis focused on those 14 probe sets that were only differentially expressed in the immortalized cells. Twelve of those 14 probe sets were mapped to genes in IPA, and a network (dubbed the COO Immortalization Network) of 40 genes was automatically constructed. This network was then assessed against the three original expression data sets in two ways. First, one gene was identified as having a significant eQTL based on the associated HapMap SNP data. Second, additional 11 genes from the immortalization network were differentially expressed across all three data sets in addition to the original 12 found (through a much more stringent filter).
|
|
 |
RESULTS
|
|---|
Identification of initial COO differential expression
We started the analysis with the reproducibility of the COO-specific
differences in the first study (
4), across two trios (CEU and
YRI) divided into four populations: HapMap parents (YRI
p and
CEU
p) and separately HapMap children (YRI
c and CEU
c). We selected
those genes that were expressed most consistently within the
YRI and separately CEU populations, respectively, and then identified
those of the intersecting set that were significantly differentially
expressed. The intersection of the number of consistently expressed
genes within COO across both populations differed for the parents
(
n = 1043) when compared with their children (
n = 568). The
shared set of genes that were highly consistently expressed
in both parental and child populations and that also were significantly
differentially expressed after Bonferroni correction numbered
228 (
Supplementary Material, Table S2). The biological functions
program significantly enriched [as per the Ingenuity IPA program
(
17)] in the differentially expressed genes included processing
and splicing of mRNA, immortalization of cells, transcription
and expression of DNA, synthesis and metabolism of proteins,
processing and modification of rRNA, receptor-mediated endocytosis,
transport and catabolism of proteins, colony formation, activation
of HIV type 1, ubiquitination and cholangiocarcinoma (data not
shown). Of the 228 genes differentially expressed across ACOO,
the top 20 genes most correlated with WRN, using Pearson correlation,
were identified and highlighted with an * in
Supplementary Material, Table S2.
Of note, the viral titer (courtesy David Altshuler, see Materials
and Methods) correlated significantly with WRN gene expression
across the filtered CEU and YRI samples from Stranger
et al.
(
5) with an
R2 = 0.69 and regression-significant
P = <2.2
x 10
–16 (Fig.
2A). Separately, the children's EBV
titer correlated with WRN expression with an
R2 of 0.86 and
P-value of 2.89
x 10
–13, and the parents EBV titer correlated
with WRN expression with an
R2 of 0.70 and
P-value of 1.18
x 10
–13 (data not shown). The distribution of WRN values
is much higher than the average expression of genes in the genome
across all samples, which is consistent with previous reports
of WRN having high levels of expression in immortalized cells.
The 20 genes closely correlated with WRN also have higher mean
expression across the CEU and YRI populations when compared
with WRN and all the transcripts measured on the arrays (Fig.
2B).

View larger version (30K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. (A) Correlation of WRN to relative EBV titer across the filtered CEU and YRI samples and (B) the distribution of non-normalized WRN values and the mean values of the 20 genes across the CEU and YRI populations and for all the transcripts measured on the arrays.
|
|
Cross platform validation of differentially expressed genes
We conducted further analyses on an additional independent CEU
and YRI population's transcriptome study. This study was performed
on the Affymetrix GeneChip Human Genome U133 Array Set HG-U133A
(
15). Of the 228 genes significantly different on the Illumina
platform between CEU and YRI, there were 99 probe sets corresponding
to the same genes significantly different on the Affymetrix
platform. Of these 99 probe sets, 21 were removed because the
differential expression was discordant (down for the YRI population
on the Illumina platform but up regulated compared to the CEU
on the Affymetrix platform) leaving 78 probe sets for further
analyses (Table
1). WRN was also among the genes that were
significantly different on the Affymetrix HG-U133A platform.
In a third, but much smaller, data set, we applied the aforementioned
filtering process on only eight CEU and eight YRI founder males
from the Affymetrix Human Focus Array and only one gene, WRN,
was found to be significantly different between CEU and YRI
samples. That is, WRN is significantly differentially expressed
in three independent studies (
4,
14,
15). The top disease and
disorders (as per the Ingenuity IPA program) enriched were viral
function, connective tissue disorders (immortalization), cancer,
cardiovascular disease and endocrine system disorders. WRN is
among the genes in each of the top three enriched categories.
The biological functions significantly enriched in the differentially
expressed genes included processing and splicing of mRNA, cross-link
repair of DNA, viral transactivation, immortalization of cells,
transcription and expression of DNA, cell division, colony formation,
contact growth inhibition, apoptosis, cell death, synthesis
of proteins, gastric carcinoma (Table
2). Additionally,
we performed linear regression analyses to determine the squared
Pearson correlation coefficients (
R2) and
p-values of the 20
genes most correlated with WRN (dependent variable) mRNA expression
in a pairwise manner out of the 78 probe sets cross-platform
validated for ACOO differential expression. We used an
R2 cutoff
of 0.7. Consequently, the top 20 correlated probe sets have
an
R2 between 0.69 and 0.84, and
P-values <2.2
x 10
–16 as described in Table
3. Sixteen (80%) of the 20 top correlated
genes grouped with WRN into one biological functions network
associated with
gene expression,
infection mechanism and
cancer with an enrichment
P-value of 1.0
x 10
–47. Seven of the
top 20 genes are members of the final 12 gene set that comprised
the immortalization network. We created an annotated network
of these 20 genes entitled the
Viral infection network,
with the transcription factors MYC and P53 serving as the central
hubs of this network (Fig.
3).

View larger version (29K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. Of the 78 probe sets cross-platform validated with ACOO differential expression, 16 (80%) of the top 20 WRN-correlated genes (R2 between 0.69 and 0.84) grouped with WRN into one biological functions network associated with gene expression, infection mechanism and cancer with an enrichment P-value of 1.0 x 10–47.
|
|
View this table:
[in this window]
[in a new window]
|
Table 1. The 78 probe sets corresponding to 53 genes drawn from the 99 probe sets list generated from the intersection between the Illumina and Affymetrix platforms of those genes that were the most consistently expressed within the YRI population and the CEU population, respectively, in both parents and in children
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. The 51 Functions identified by the IPA package for the cross-platform 78 Probes sets differentially expressed between CEU and YRI Trios
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. The top 20 Pearson correlation coefficients (R2), F-statistic and P-values of WRN (dependent variable) mRNA expression in a pairwise manner to all 78 probe sets cross-platform validated with ACOO differential expression
|
|
View this table:
[in this window]
[in a new window]
|
Table 4. The 24 immortalization probe set Ids and ACOO expression differences in P-values for SekWon et al.'s primary LBC data set (Affymetrix.GeneChip.HG-U133_Plus_2), Yelensky et al.'s (Affymetrix.GeneChip.HG-U133A) and Stranger et al.'s (Illumina WGA-6) immortalized LBC data sets
|
|
Identification of ACOO immortalization sensitive genes
To further explore which subset of the COO differentially expressed
genes is specific to ACOO but not immortalization and specific
to differences in the immortalization process with respect to
ACOO, the results above were contrasted to an expression study
of non-immortalized lymphoid cells harvested from the peripheral
blood from AA and CA children. Figure
4 depicts a Venn
diagram of the 78 significantly differentially expressed probe
sets across platforms (Illumina and Affymetrix) between the
immortalized CEU/YRI cells. Of those, 64 probe sets (82%) were
confirmed to be significantly different between the AA and CA
children populations. This left 14 probe sets (including WRN)
that were differentially expressed across the CEU and YRI in
the immortalized cell experiments.

View larger version (87K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 4. Twelve of the 14 probe sets identified in the Venn diagram with immortalized cell-specific differential expression (circled in Venn diagram), mapped to 12 independent genes in Ingenuity Pathway program to construct the immortalization network'. The 12 independent genes are depicted in red. POLR1A which has an heritable eQTL in the YRI population with significant differential expression by ACOO is in green. The additional genes with ACOO significantly different expression but are not immortalization specific are in yellow.
|
|
An EBV immortalization gene network
The 14 probe sets that are significantly different between CEU
and YRI immortalized cells that were not identified in non-immortalized
lymphoblast cells (LCs) were mapped into Ingenuity's (IPA) package
(Ingenuity® Systems,
www.ingenuity.com) to determine which
networks were enriched with these genes. Twelve of the 14 probe
sets were mapped into IPA identifying 12 genes (two were unmapped
ESTs) ARCN1, ATP5B, JMJD1B, NOL7, NUP54, PFN1, POLR2B, PRCC,
PUM1, PWP1, WRN, ZNF410. The genes clustered into three significantly
overrepresented/enriched networks with 10 genes mapped into
the top-scoring network of DNA replication, recombination and
repair with a
P-value of 10
–7. JMJD18 and PUM1 mapped
separately to Networks 2 and 3. The 10 genes from Network 1
were exported into Ingenuity's Pathway editor to build a combined
Immortalization Network that includes JMJD18 and
PUM1 (colored red in Fig.
4). There were several genes
enriched in the
Immortalization Network that were
not part of the original 14 gene list. Subsequent to finding
the marked network enrichment score, we relaxed the cutoffs
in three ways, intra-population consistency criterion,
P-value
cutoff and multiple test correction (see Materials and Methods
for more detail) in determining the statistical inference of
the additional genes in the Immortalization Network, for the
Illumina Platform only. By relaxing the aggressive filtering
(of samples and genes) originally performed to increase specificity
across the noisy and different expression platforms, an additional
11 genes (NUP62, BAT1, PSME3, SFRS2, PLRG1, CDC5L, EXO1, FEN1,
DNAJA1, VCP and ZNF512B) were identified that have an ACOO-significant
expression difference (Table
4) in the Immortalization
Network (colored yellow in Fig.
4).
Continent of origin (COO) eQTLs within the associated immortalization pathway
We determined whether any of the genes in the Immortalization Network which had ACOO significant expression difference across the two immortalized and control data sets manifested heritable eQTL differences between CEU and YRI by using the public SNP data from NCBI build 36 (dbSNP b126) (http://ftp.hapmap.org/genotypes/2008-10_phaseII/). There was one gene, POLR1A (colored green in Fig. 4), with expression in the YRI cohort founders (60 samples) that associated with SNP rs12124 in a cis eQTL (–log10 P-value = 5.77 x 10–9). POLR1A also has ACOO discordant expression across all three data sets. This eQTL finding is consistent with a previous report by Stranger et al. (data not shown).
 |
DISCUSSION
|
|---|
The YRI is one of the native sub-Saharan populations suffering
from the childhood cancer pandemic BL caused by the EBV. The
International HapMap Project harvested peripheral blood lymphoblasts
from the YRI trios and then transformed them into immortalized
cells using EBV
in vitro. This raised the question of the degree
to which the previously reported expression differences are
due to laboratory technique, measurement platform difference,
laboratory-specific variation in EBV-driven cell immortalization
or COO-specific responses to EBV infection and immortalization.
To explore this question we tailored the approach outlined in
Figure
1. This analysis led to the identification of an
immortalization network characterizing the expression differences
specific to the immortalization process of the CEU and YRI samples
across three independent studies (
4,
14,
15) and distinct from
a fourth independent study of ACOO differences in non-immortalized
cells of AA and CA cohorts (
16). Of note, one of the genes in
this network, WRN, a gene mutated in Werner Syndrome (WS), a
recessive genetic disorder associated with a complex premature
ageing phenotype, has been shown to modulate the efficiency
of EBV immortalization of LC lines (
18,
19), possibly through
its role in the stabilization of telomeres and telomerase and
the immortalized genome (
20,
21). Likewise, the expression of
WRN (and the other genes in the immortalization network) is
highly correlated with EBV titer (Fig.
2). Sixteen (80%)
of the top 20 genes most correlated with WRN and sixteen (80%)
of its twenty most correlated genes grouped into one biological
functions network associated with gene expression, infection
mechanism and cancer here termed
Viral infection network.
Seven of the top 20 genes of the viral infection network are
part of the final 12 genes that framed the immortalization network.
At the center of this network are transcription factors MYC
and P53. The MYC gene recently reported by Faumant
et al. (
22)
was to be one of the two master transcriptional
systems activated in latency III program of EBV immortalization
of B-cells. Among their reported major players in the EBV immortalization
process are EXO1 and FEN1 which both directly bind to WRN and
are significantly different and enriched in our reported immortalization
network. In addition, p53 is among the genes in the viral infection
network and was reported recently by Yi
et al. (
23) to have
its transcriptional and apoptotic activities modulated by the
EBV protein EBNA3C latent antigen essential for
in vitro B-cell
immortalization. This analysis does not rule out the possibility
that all the observed COO differences are a function of a batch
effect of the different times, techniques and laboratories involved
in the immortalization process of the different HapMap populations
even with observed differences in three sets of experiments.
However, POLR1A's significantly up-regulated expression and
the specific eQTL within the YRI founders may play a role in
this population's increased sensitivity to EBV infection. Albeit
circumstantial evidence, recently published by Michiels
et al.
(
24) supports a possible role of POLR1A as a marker for head-and-neck
cancers. Additionally, research by Shiratori
et al. (
25) reported
that in WS fibroblasts, the WRN gene promotes rRNA transcription
as a component of an RNA polymerase I (RPI)-associated complex,
of which POLR1A is one of the core subunits (
26). The Shiratori
et al.'s study identified decreased levels of rRNA transcription
compared with wild-type cells as a measurable marker for characterizing
the premature aging of WS. They further showed how fibroblast
cells in the presence of wild-type WRN increased rRNA levels
and cell proliferation. Although further studies are required
to elucidate POLR1A's role in EBV-transformed B-cell, our findings
shed light on POLR1A as a component of the EBV
in vitro cell
immortalization process with a possible ACOO hereditary signature.
The findings presented here are consistent with the yet unproven
hypothesis that these
in vitro results echo population health;
that is, lymphoblastoid cell lines sensitivity to EBV immortalization
may mirror the EBV infection pandemic in Central Africa. The
aforementioned data are presented as initial evidence of a set
of genes that differ in expression by ACOO and among them a
subset of genes that is environmentally sensitive to EBV in
healthy individuals. Further studies are required to evaluate
this hypothesis and measurements in individuals with different
COO during
in vivo EBV infection might be illuminating in this
regard.
 |
MATERIALS AND METHODS
|
|---|
Normalization
In the initial analysis of the Illumina Human V6 arrays used
by Stranger
et al. (
4) and the Affymetrix Human Focus arrays
used by Storey
et al. (
14), array probe set intensities that
were <0.01 were set to 0.01. For each individual array, all
probe sets were divided by the 50th percentile of all probes
sets on that array and then each gene was divided by the median
of its measurements across all arrays. For the U133 Array Set
HG-U133A and the HG-U133-Plus-2 arrays, we applied GCRMA normalization.
The expression arrays used to determine eQTLs were normalized
as described in the Bioconductor program (
27) GGtools 3.0 created
by Vince Carey (
28).
Noise reduction in Stranger et al.'s data set
We intentionally pursued a highly conservative analysis to maximize specificity. Each population was filtered to include only genes that have a 100% detection rate across all in-vitro transcriptions (IVTs) to be compared. For the first data set (4): out of the 47 293 probe sets on each array [compared between the CEU (60 samples) and YRI (60 samples) parents and children (30 samples each) groups], only 4640 probes for CEUp and YRIp and 4839 probes for CEUc and YRIc populations were detected at 100% across all IVTs. To determine the IVT replication outliers, principal component analysis of the 100% detected gene list was used. An outlier was defined as any IVT that was not within the same quarter as the other replicates in the four quarters from PC1 (x-axis) and PC2 (y-axis) (Supplementary Material, Fig. S1). There had to be at least three IVTs grouped for each cell line for inclusion in the analysis. The gene intensity variation across replicated IVTs within a population was filtered to include only those probes sets with a ± 0.5 standard deviation of the mean. This resulted in the following sets of population-consistent probe sets: YRIp 3121 probe sets, CEUp 2759 probe sets, YRIc 1640 probe sets and CEUc with 1520 probe sets whose combined expression ranges were within a one standard deviation band spanning the population mean. Differentially expressed probe sets were identified using one-way ANOVA (false discovery rate of 0.01, t-test with unequal variance and Bonferroni correction for multiple testing). We then obtained the intersection of the population-consistent probe sets across YRIp and CEUp identifying 1043 such probe sets. We compared the mean expression of the 1043 probe sets between CEUp and YRIp (t-test with P-value = 0.01 and Bonferroni correction), resulting in 958 probe sets that were significantly different between CEUp and YRIp populations. Within the CEUc versus YRIc populations, there were 607 shared probe sets that were population consistent in their respective populations. We compared the mean expression differences of 607 probe sets between CEUc and YRIc using t-test as previously described; this resulted in 568 probe sets that were significantly different between CEUc and YRIc populations. Of the above 958 and 568 differentially expressed probes, 228 probe sets were differentially expressed in both parent and child populations. When the same analysis was performed applying the same rigorous filtering on a smaller data set of eight CEU and eight YRI founder males, the only gene differentially expressed was WRN on the Affymetrix Human Focus Array (14).
The 228 probe sets network analysis
We used the Ingenuity Pathways Analysis program (IPA—Ingenuity® Systems, www.ingenuity.com) to analyze the set of differentially expressed probe sets. Of the 228 probe sets, we exclude 11 expressed sequence tags (ESTs), and the remaining 217 probe sets were mapped into IPA with 140 of the 217 probe sets specifically mapping into the functions/pathways by RefSeq accession numbers. With removal of redundant gene symbols, 101 genes in total enriched 269 functions and diseases annotations (FAs). Of the 269 FAs significantly enriched within the 228 probe list, we removed 237 enriched FAs that had less than three genes, P-values >0.05 and/or redundant names, resulting in a final 32 FA categories enriched in the differentially expressed gene list comparing CEU and YRI samples. The 32 enriched FAs are comprised of 87 (86%) of the overall 101 genes annotated in FAs by the IPA package (Data not shown).
Viral titers
Cell-line-specific viral titers were shared with us courtesy of David Altshuler and Roman Yelensky (Broad Institute, Cambridge, MA, USA). Relative EBV copy number was determined by the difference of CT method (2) and log-transformed. EBV measurements were obtained when cell-lines were first received from the Coriell Institute in 2005.
Cross platform validation of the 228 genes in Yelensky et al. affymetrix data set
The 228 genes identified with COO differential expression from Stranger et al. samples (Illumina platform) were validated across platforms using an independent study of the same samples from the CEU and YRI populations on the Affymetrix GeneChip Human Genome U133 Array Set HG-U133A (15). The initial 228-gene list mapped to 352 probe sets on the HG-U133A array by RefSeq accession number. Of the 228 genes that were significantly different on the Illumina platform between CEU and YRI, there were 78 probe sets of the same genes that were significantly different at a P-value cutoff of 0.05 with Benjamini–Hochberg multiple testing correction on the Affymetrix platform. The WRN gene was also among the genes that were significantly different on the Affymetrix platform, a finding that was confirmed in a third independent study of Storey et al.'s data on the Affymetrix Human Focus Arrays.
Squared Pearson correlation coefficients (R2)
We preformed a liner regression analyses to determine the squared Pearson correlation coefficients (R2) and P-values of WRN (dependent variable) mRNA expression in a pairwise manner to all 78 probe sets cross-platform validated with ACOO differential expression. We reported the genes with an R2 cutoff of 0.7 or greater (Table 3).
Intersection of the immortalized cell gene list with the non-immortalized significantly different gene list
We used an in house unpublished data set of AA and CA samples consisting of 43 male and female children from 1 to 16 years of age. These samples were collected as control samples in an unrelated study of autism spectrum disorder (ASD). LCs were isolated and RNA extracted (without EBV immortalization) and hybridized to the Affymetrix U133plus2 array. The initial 228 gene list mapped to 352 probe sets on the U133plus2 array by RefSeq accession number. Statistical inference was determined using parametric test; variance assumed unequal Student's t-test, P-value cutoff 0.05, with Benjamini-Hochberg multiple test correction. Of the 524 across platform-intersected probes, 288 probe sets had significant difference between the AA and CA cohorts. We cross array (U133Pluse2 to U133A) matched the RefSeq numbers of the 288 probes yielding 299 probes for intersection across platforms. We intersected the 299 probe sets with the across platform confirmed 78 probe sets that have discordant expression between CEU and YRI trios.
Immortalization network enrichment
Twelve of the 14 probe sets identified as immortalized cell specific were enriched in IPA and mapped to 12 independent genes (two were unmapped ESTs). The genes clustered into 3 networks with 10 genes mapped into the top network of DNA replication, recombination and repair with a P-value of 10–27. JMJD18 and PUM1 mapped separately to Networks 2 and 3. The 10 genes from Network 1 were exported into IPA editor to construct the Immortalization Network including JMJD18 and PUM1. To determine whether any of these additional genes have significant ACOO differential expression (subsequent to finding the marked network enrichment score), we relaxed the statistical inference cutoffs in three ways. First, we no longer filtered the genes to meet the intra-population consistency criterion. Second, we relaxed the P-value cutoff from 0.01 to 0.05 and, finally, we changed the multiple test correction to Benjamini–Hochberg from Bonferroni for statistical inference for the Illumina Platform only.
ACOO-specific eQTLs
The eQTLs were determined using the Bioconductor program (27) GGtools 3.0 written by Vince Carey. Here we used only the founder population (60 parents) for the CEU and YRI cohorts. A relevant eQTL was only determined to be of interest when it was discordant for significance across the YRI and CEU populations. A significant cis eQTL is defined as having an SNP correlated to a gene's expression within 50 kb from the 5' or 3' end of the gene with a significant P-values less than or equal to –log10 10–8.
 |
SUPPLEMENTARY MATERIAL
|
|---|
Supplementary Material is available at HMG online.
 |
FUNDING
|
|---|
This work was supported in part by National Library of Medicine
[U54LM008748–03 to I.S.K.] and National Human Genome Research
Institute [T32HG02295 to A.R.D.]. Funding to pay the Open Access
publication charges for this article was provided by National
Library of Medicine [U54LM008748-03].
 |
ACKNOWLEDGEMENTS
|
|---|
The authors are indebted to Zoltan Szallasi and Simon Kasif
for critical reading and suggestions regarding biological validation.
They also recognize the generous support of David Altshuler
and Roman Yelenksy in providing the relative EBV viral titer
data. They also thank Vincent Carey for assistance with R-GUI
and Bioconductor package GGTools and GGdata, and Sek Won Kong,
Christin Collins, Ingrid Holm and Lou Kunkel for providing the
expression arrays of the African American and Caucasian controls
from their Autism study.
Conflict of Interest statement. None declared.
 |
REFERENCES
|
|---|
-
Allocco D.J., Song Q., Gibbons G.H., Ramoni M.F., Kohane I.S. Geography and genography: prediction of continental origin using randomly selected single nucleotide polymorphisms. BMC Genomics (2007) 8:e68.[CrossRef]
-
Echols M.R., Yancy C.W. Isosorbide dinitrate–hydralazine combination therapy in African Americans with heart failure. Vasc. Health Risk Manag. (2006) 2:423–431.[CrossRef][Medline]
-
Jorgenson E., Tang H., Gadde M., Province M., Leppert M., Kardia S., Schork N., Cooper R., Rao D.C., Boerwinkle E., et al. Ethnicity and human genetic linkage maps. Am. J. Hum. Genet. (2005) 76:276–290.[CrossRef][Web of Science][Medline]
-
Stranger B.E., Forrest M.S., Clark A.G., Minichiello M.J., Deutsch S., Lyle R., Hunt S., Kahl B., Antonarakis S.E., Tavare S., et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. (2005) 1:e78.[CrossRef][Medline]
-
Stranger B.E., Nica A.C., Forrest M.S., Dimas A., Bird C.P., Beazley C., Ingle C.E., Dunning M., Flicek P., Koller D., et al. Population genomics of human gene expression. Nat. Genet. (2007) 39:1217–1224.[CrossRef][Web of Science][Medline]
-
Tishkoff S.A., Kidd K.K. Implications of biogeography of human populations for race and medicine. Nat. Genet. (2004) 36:S21–S27.[CrossRef][Web of Science][Medline]
-
Cheadle C., Becker K.G., Cho-Chung Y.S., Nesterova M., Watkins T., Wood W. 3rd, Prabhu V., Barnes K.C. A rapid method for microarray cross platform comparisons using gene expression signatures. Mol. Cell Probes (2007) 21:35–46.[CrossRef][Web of Science][Medline]
-
Kuo W.P., Jenssen T.K., Butte A.J., Ohno-Machado L., Kohane I.S. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics (Oxford, England) (2002) 18:405–412.[CrossRef]
-
Eklund A.C., Szallasi Z. Correction of technical bias in clinical microarray data improves concordance with known biological information. Genome Biol. (2008) 9:R26.[CrossRef][Medline]
-
Burkitt D. A sarcoma involving the jaws in African children. Br. J. Surg. (1958) 46:218–223.[Web of Science][Medline]
-
Mutalima N., Molyneux E., Jaffe H., Kamiza S., Borgstein E., Mkandawire N., Liomba G., Batumba M., Lagos D., Gratrix F., et al. Associations between Burkitt lymphoma among children in Malawi and infection with HIV, EBV and malaria: results from a case–control study. PLoS ONE (2008) 3:e2505.[CrossRef][Medline]
-
Ogwang M.D., Bhatia K., Biggar R.J., Mbulaiteye S.M. Incidence and geographic distribution of endemic Burkitt lymphoma in northern Uganda revisited. Int. J. Cancer (2008) 123:2658–2663.[CrossRef][Web of Science][Medline]
-
Wakabi W. Kenya and Uganda grapple with Burkitt lymphoma. Lancet Oncol. (2008) 9:e319.[CrossRef]
-
Storey J.D., Madeoy J., Strout J.L., Wurfel M., Ronald J., Akey J.M. Gene-expression variation within and among human populations. Am. J. Hum. Genet. (2007) 80:502–509.[CrossRef][Web of Science][Medline]
-
Choy E., Yelensky R., Bonakdar S., Plenge R.M., Saxena R., De Jager P.L., Shaw S.Y., Wolfish C.S., Slavik J.M., Cotsapas C., et al. Genetic analysis of human traits in-vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. (2008) 4:e1000287.[CrossRef][Medline]
-
Kong S., Collins C., Holm I., Kunkel L. Control Samples of Autism Spectrum Disorder Hospital Program in Genomics (2009) Boston, MA, USA: Harvard Medical School.
-
Mayburd A.L., Martlinez A., Sackett D., Liu H., Shih J., Tauler J., Avis I., Mulshine J.L. Ingenuity network-assisted transcription profiling: Identification of a new pharmacologic mechanism for MK886. Clin. Cancer Res. (2006) 12:1820–1827.[Abstract/Free Full Text]
-
Sugimoto M., Tahara H., Ide T., Furuichi Y. Steps involved in immortalization and tumorigenesis in human B-lymphoblastoid cell lines transformed by Epstein-Barr virus. Cancer Res. (2004) 64:3361–3364.[Abstract/Free Full Text]
-
Sugimoto M., Tahara H., Okubo M., Kobayashi T., Goto M., Ide T., Furuichi Y. WRN gene and other genetic factors affecting immortalization of human B-lymphoblastoid cell lines transformed by Epstein–Barr virus. Cancer Genet. Cytogenet. (2004) 152:95–100.[CrossRef][Web of Science][Medline]
-
Lebel M., Leder P. A deletion within the murine Werner syndrome helicase induces sensitivity to inhibitors of topoisomerase and loss of cellular proliferative capacity. Proc. Natl Acad. Sci. USA (1998) 95:13097–13102.[Abstract/Free Full Text]
-
Leder A., Lebel M., Zhou F., Fontaine K., Bishop A., Leder P. Genetic interaction between the unstable v-Ha-RAS transgene (Tg.AC) and the murine Werner syndrome gene: transgene instability and tumorigenesis. Oncogene (2002) 21:6657–6668.[CrossRef][Web of Science][Medline]
-
Faumont N., Durand-Panteix S., Schlee M., Gromminger S., Schuhmacher M., Holzel M., Laux G., Mailhammer R., Rosenwald A., Staudt L.M., et al. c-Myc and Rel/NF-kappaB are the two master transcriptional systems activated in the latency III program of Epstein–Barr virus-immortalized B cells. J. Virol. (2009) 83:5014–5027.[Abstract/Free Full Text]
-
Yi F., Saha A., Murakami M., Kumar P., Knight J.S., Cai Q., Choudhuri T., Robertson E.S. Epstein–Barr virus nuclear antigen 3C targets p53 and modulates its transcriptional and apoptotic activities. Virology (2009) 388:236–247.[CrossRef][Web of Science][Medline]
-
Michiels S., Danoy P., Dessen P., Bera A., Boulet T., Bouchardy C., Lathrop M., Sarasin A., Benhamou S. Polymorphism discovery in 62 DNA repair genes and haplotype associations with risks for lung and head and neck cancers. Carcinogenesis (2007) 28:1731–1739.[Abstract/Free Full Text]
-
Shiratori M., Suzuki T., Itoh C., Goto M., Furuichi Y., Matsumoto T. WRN helicase accelerates the transcription of ribosomal RNA as a component of an RNA polymerase I-associated complex. Oncogene (2002) 21:2447–2454.[CrossRef][Web of Science][Medline]
-
Suzuki N., Shimamoto A., Imamura O., Kuromitsu J., Kitao S., Goto M., Furuichi Y. DNA helicase activity in Werner's syndrome gene product synthesized in a baculovirus system. Nucleic Acids Res. (1997) 25:2973–2978.[Abstract/Free Full Text]
-
Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. (2004) 5:R80.[CrossRef][Medline]
-
Carey V.J., Davis A.R., Lawrence M.F., Gentleman R., Raby B.A. Data structures and algorithms for analysis of genetics of gene expression with Bioconductor: GGtools 3.x. Bioinformatics (Oxford, UK) (2009) 25:1447–1448.

CiteULike
Connotea
Del.icio.us What's this?