Human Molecular Genetics Advance Access originally published online on October 21, 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Human Molecular Genetics, 2003, Vol. 12, No. 24 3245-3258
DOI: 10.1093/hmg/ddg347
© 2003 Oxford University Press
Classifying the estrogen receptor status of breast cancers by expression profiles reveals a poor prognosis subpopulation exhibiting high expression of the ERBB2 receptor
1National Cancer Centre, 2Department of Pathology and 3Defence Medical Research Institute, 11 Hospital Drive, Singapore 169610, Republic of Singapore and 4Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Republic of Singapore
Received July 4, 2003; Accepted October 9, 2003
| ABSTRACT |
|---|
|
|
|---|
Recent work using expression profiling to computationally predict the estrogen receptor (ER) status of breast tumors has revealed that certain tumors are characterized by a high prediction uncertainty (low-confidence). We analyzed these low-confidence tumors and determined that their uncertain prediction status arises as a result of widespread perturbations in multiple genes whose expression is important for ER subtype discrimination. Patients with low-confidence ER+ tumors exhibited a significantly worse overall survival (P=0.03) and shorter time to distant metastasis (P=0.004) compared with their high-confidence ER+ counterparts, indicating that the high- and low-confidence binary distinction is clinically meaningful. We then discovered that elevated expression of the ERBB2 receptor is significantly correlated with a breast tumor exhibiting a low-confidence prediction, and this association was subsequently validated across multiple independently derived breast cancer expression datasets employing a variety of different array technologies and patient populations. Although ERBB2 signaling has been proposed to inhibit the transcriptional activity of ER, a large proportion of the perturbed genes in the low-confidence/ERBB2+ samples are not known to be estrogen responsive, and a recently described bioinformatic algorithm (DEREF) was used to demonstrate the absence of potential estrogen-response elements (EREs) in their promoters. We propose that a significant portion of ERBB2's effects on ER+ breast tumors may involve ER-independent mechanisms of gene activation, which may contribute to the clinically aggressive behavior of the low-confidence breast tumor subtype.
| INTRODUCTION |
|---|
|
|
|---|
The classification of breast tumors into estrogen receptor positive (ER+) and negative (ER-) subtypes is a critical distinction in the treatment of breast cancer. ER- tumors are in general more clinically aggressive than their ER+ counterparts, and ER+ tumors are routinely treated using anti-hormonal therapies such as tamoxifen (1). Presently, a tumor's ER status is routinely determined by immunohistochemistry (IHC) or immunoblotting using an antibody to ER. This technique, however, is imperfectfor example, it may fail to detect tumors harboring genetic alterations in ER that render it inactive or constitutively active (2). Thus, it is crucially important to develop more accurate methodologies to improve the ER subtype classification of breast tumors, so that the appropriate therapies can be subsequently applied.
A number of groups have recently published reports utilizing expression profile data to classify breast cancers into ER+ and ER- categories. In one study, it was found that the expression profiles of ER+ and ER- tumors are remarkably distinct, supporting previous theories that ER+ and ER- tumors may arise from distinct breast epithelial cell types (3). Another group has reported the use of supervised learning methodologies on expression data to classify breast tumors by ER subtype (4). One common observation in these studies was that, although the majority of breast tumors could usually be accurately classified into ER+ and ER- subtypes to a high degree of certainty, there always existed a set of low-confidence samples that were either misclassified or where the statistical confidence of the predictions was marginal. Although it was proposed that these low-confidence samples might reflect the effects of population heterogeneity (4), the hypothesis that such low-confidence samples might be biologically distinct from their high-confidence counterparts has not been fully explored to date.
The experiments in this report were motivated by the possibility that the low-confidence samples might possess distinct biological characteristics. We performed a classification analysis using an in-house generated breast cancer expression dataset, and determined that, in comparison to the high confidence tumors, the low-confidence tumors exhibited widespread perturbations in the expression of multiple genes important for ER subtype discrimination. Although initially derived through purely computational means, the distinction between high- and low-confidence tumors is clinically meaningful, as low-confidence ER+ tumors exhibited a significantly worse overall survival and shorter time to distant metastasis than their high-confidence ER+ counterparts. Such a distinction is currently not discernible by conventional immunohistochemical strategies used to detect ER. We then unexpectedly discovered that high expression levels of the ERBB2 receptor are significantly correlated with breast tumors exhibiting a low-confidence prediction, and validated this association across three independently derived breast cancer expression datasets generated from different patient populations/array technologies, and analyzed using different computational methods. The association between ERBB2 expression and the widespread perturbations of ER-discriminator genes observed in the low-confidence tumors is intriguing, as ERBB2 activity is known to contribute, in both breast tumors and cell lines, towards the development of resistance to anti-hormonal therapies (5,6), and to inhibit the transcriptional activity of ER (5,7). However, despite being important for ER subtype discrimination, we found that a significant proportion of these perturbed genes, are not known to be estrogen responsive and, using a recently described bioinformatic algorithm (DEREF), also demonstrated that these genes do not contain potential estrogen-response elements (EREs) in their promoters. Our results suggest that, in addition to current models where ERBB2 acts primarily by disrupting the transcriptional activity of ER, a significant fraction of ERBB2's effects on ER+ breast tumors may involve ER-independent mechanisms of gene activation as well, which may collectively contribute to the clinically aggressive nature of the low-confidence breast tumor subtype.
| RESULTS |
|---|
|
|
|---|
Classification of breast tumors by ER status using expression profiles from Chinese patients reveals a distinct population of low-confidence samples
The overall incidence patterns of breast cancer in Caucasian and Asian populations are distinct (8), prompting us to investigate if findings from previous reports (3,4) could also be observed in our local patient population. We first used gene expression profile data to classify a set of breast tumors by their ER status. A training set of 55 breast tumors was selected, where the ER status of each tumor was pre-determined using IHC. Two classification methods were tested: weighted-voting (WV) and support vector machines (SVM), and classification accuracy was assessed through leave-one-out cross-validation (LOOCV; Supplementary Material). In addition to classifying a sample, quantitative metrics were used to provide an assessment of classification uncertainty (Materials and Methods). For this and all subsequent analyses (including independent data sets), similar results were obtained when the cutoff threshold defining a high versus low confidence sample was varied by ±10% (Supplementary Material). The overall classification accuracy on the training set was 95% (WV) and 96% (SVM), with seven samples characterized by low-confidence or marginal predictions (gray box, Fig. 1A). To determine if such low-confidence samples could also be observed in an independent set of tumors, a second set of 41 tumors was used as an independent test set. Although the overall classification accuracy on the independent test set was 91% (WV and SVM), nine samples once again displayed a low-confidence prediction (Fig. 1B). Thus, using two different classification methods (WV and SVM), certain breast tumors were found to exhibit a distinct low-confidence character when being classified by ER status on the basis of their gene expression profiles.
|
Patients with low-confidence ER+ tumors exhibit decreased overall survival and shorter time to distant metastasis in comparison to patients with high-confidence ER+ tumors
Since the differentiation of tumors into high- and low-confidence sub-populations was achieved through a purely computational analysis of tumor gene expression profiles, it is unclear if this distinction is biologically or clinically meaningful, and if the use of gene expression profiles in this manner affords any substantial advantage over conventional immunohistochemical techniques to determine the ER status of breast tumors. To address this issue, we investigated if the low-confidence tumors might exhibit any clinical behaviors distinct from their high-confidence counterparts. We used two publicly available breast cancer expression data sets for which related but distinct types of clinical information was available. The first set (9) consists of a cDNA microarray data set of 78 breast carcinomas and seven non-malignant samples with overall patient survival information (referred to as the Stanford data set). The second set (10) consists of 71 ER+ and 46 ER- lymph-node negative tumors profiled using oligonucleotide-based microarrays, and for 97 tumors the time interval from initial tumor diagnosis to the appearance of a new distant metastasis was available (referred to as the Rosetta data set). We used WV to classify the breast tumors in the Stanford and Rosetta datasets by their ER subtype (Supplementary Material). Consistent with our own data set, among the 56 ER+ and 18 ER- tumors in the Stanford data set (four tumors were removed due to lack of ER status or other clinical information), we observed an overall LOOCV accuracy of 93%, with 14 tumors (18.9%) being classified as low-confidence. Similarly, the WV analysis also identified 18 out of 117 tumors (15.4%) in the Rosetta data set as exhibiting a low-confidence classification, with an overall LOOCV accuracy of 92%. These figures are comparable to that observed in our own patient population, suggesting that low-confidence tumors occupy between 15 and 19% of the overall breast tumor population.
We then compared the clinical behaviour of the high- and low-confidence tumor populations using KaplanMeier analysis. As shown in Figure 2, patients with low-confidence tumors exhibited a significantly worse overall survival (P= 0.0003, log-rank test) and shorter time to distant metastasis (P=0.0001, log-rank test) than their high-confidence counterparts. This result indicates that the high versus low-confidence binary distinction is indeed clinically meaningful. We then repeated this analysis under conditions where the tumors were first subdivided into independent ER+ and ER- categories. For ER+ tumors, we once again found that low-confidence ER+ tumors were associated with a significantly worse overall survival (P=0.03, log-rank test) and shorter time to metastasis (P=0.004, log-rank test; Fig. 2) than high-confidence ER+ tumors. No statistically significant differences in overall survival and time to metastasis were observed for the ER- tumors (P. Tan, unpublished data). These results indicate that ER+ tumors can be subdivided on the basis of the high- and low-confidence binary classification into distinct disease groups exhibiting different clinical behaviors. Since distinguishing between these two groups is currently not possible by conventional immunohistochemical methods used for ER detection, this result also demonstrates how gene expression profile data can be a useful adjunct to conventional strategies for breast cancer prognostication and staging.
|
Low-confidence tumors exhibit widespread perturbations in the expression of genes important for ER subtype discrimination
The classification algorithms used in these and other studies (e.g. WV, SVM, ANN, see below) all rely upon the combinatorial input of multiple discriminator genes whose individual contributions are then combined to arrive at a particular classification decision (i.e. if the tumor is ER+ or ER-). It is formally possible that the low-confidence prediction status of these breast tumors is due to either the dramatic deregulation of a few key discriminator elements (i.e. specific effects), or the more subtle perturbation of a large number of discriminator genes (i.e. widespread effects). To distinguish between these two possibilities, we compared the expression levels of genes important for ER subtype discrimination between high- and low-confidence tumors. First, to identify ER-discriminating genes which where differentially regulated between ER+ and ER- tumors, we utilized a statistical technique called significance analysis of microarrays (SAM) (11). Employing our combined dataset (total number=96 tumors), a total of 133 differentially regulated genes (SAM-133) were identified at a false discovery rate (FDR) of 0% (the FDR is an index used by SAM to estimate the number of false positivesan FDR of 10% for 100 genes indicates that 10 genes are likely to be false positives). In this set, 122 genes were up-regulated in ER+ samples (i.e. positively correlated to ER status), while the remaining 11 were down-regulated in ER+ tumors (i.e. negatively correlated to ER). As predicted, the SAM-133 gene set includes a number of genes related to the ER pathway, such as ESR1, LIV1 (an estrogen-inducible gene) and TFF1, and some genes (e.g. GATA-3) were identified multiple times. A number of genes in the SAM-133 list are also found in similar lists reported by others (3,4).
We then subdivided the ER+ and ER- tumors each into high- and low-confidence categories (i.e. ER+/High, ER+/Low, ER-/High, ER-/Low), and the expression levels of the SAM-133 genes were compared between the groups (Fig. 3). Of the 122 genes in the SAM-133 gene set that were positively correlated to ER status,
62% exhibited a significantly lower average expression level (referred as perturbed expression) in the ER+/Low samples compared to the ER+/High tumors (P<0.05, Fig. 3A and Table 2). Genes with perturbed expression included ER, GATA3, BCL2, IGF1R and RARA, while other ER-discriminator genes, such as TFF1, TFF3 and XBP1, were unaffected. Similarly, in the ER- high- and low-confidence samples, we witnessed a reciprocal pattern where
42% of the 122 genes exhibited a higher average expression level in the ER-/Low samples compared with the ER-/High tumors (P<0.05, Fig. 3B and Table 2). Intriguingly, although the expression levels of certain genes (e.g. GATA3, BCL2) were commonly perturbed between low- and high-confidence samples in both the ER+ and ER- subtypes, the perturbation of other genes appeared to be subtype-specific. For example, ESR1 and IGFR1 were only perturbed in the ER+ samples, while XBP1 was only perturbed in the ER- samples. Finally, there were minimal changes in the expression levels of ER-discriminating genes that were negatively correlated to ER+ status (i.e. highly expressed in ER- tumors; Fig. 3C and D). This result suggests that the expression perturbations observed in the low-confidence samples, although widespread, are primarily observed in genes whose expression is positively correlated to ER (Supplementary Material).
|
|
Elevated expression of the ERBB2 oncogene is significantly associated with the low-confidence predictions
The expression perturbations observed in the low-confidence breast tumors could be due to multiple reasons, ranging from experimental variation (e.g. poor sample quality, tumor excision and handling), choice of the classification method, to population and sample heterogeneity. To gain insights into the possible mechanisms underlying these expression perturbations, we attempted to determine if there were any specific histopathological parameters that might be correlated to the low-confidence state. No significant associations were observed between the low-confidence status of a tumor and patient age, lymph node status, tumor grade, p53 mutation status or progesterone receptor status (Table 1). We discovered, however, a significant positive association (P<0.001, Supplementary Material) between a tumor's ERBB2 status and a low-confidence prediction. This correlation, observed using the training set data, was then assessed using the independent test set samples. Of the nine low-confidence samples in the independent test set, eight tumors were also ERBB2+ (8/9), indicating that this association is not dataset-specific. Supporting the association between low-confidence status and ERBB2 expression, a principal components analysis (PCA) of the 96 tumors on the basis of the SAM-133 genes effectively subdivided the high confidence ER+ and ER- tumors, with the ERBB2+ samples falling in an intermediate low-confidence area (Fig. 3E).
|
We also investigated if the correlation between low-confidence prediction strength and high ERBB2 expression could have been discovered independently by comparing the global expression profiles of high- and low-confidence tumors. First, we compared the high-confidence and low-confidence tumors belonging to the ER+ subtype. A total of 89 genes were identified as being significantly regulated (FDR=14%). Among the top 50 most significantly up-regulated genes in the ER+ low-confidence samples, three genesPMNT (ranked fourth), GRB7V (eighth), and ERBB2 (36th) were of particular interest (Supplementary Material), as they are all physically located on the 17q21 region, a frequent target of DNA amplification in breast cancer (12). In a separate analysis, the ER- high-confidence and ER- low-confidence samples were also compared. Among the top 50 genes identified as being differentially regulated (FDR=4%), we once again identified the 17q21 genes PMNT (ranked fifth), GRB7V (10th) and ERBB2 (28th), and a fourth 17q21 gene (hypothetical gene MGC9753), as exhibiting increased expression in the low-confidence samples (Supplementary Material). Indeed, the 17q21 locus was the most commonly identified genomic location for genes exhibiting increased expression in the low-confidence ER+ and ER- samples, being represented at almost twice the frequency as compared with the next most common locus (1q21). A permutation analysis utilizing 10 000 randomly generated 50-member gene sets also revealed that the probability at which the same three 17q21 loci might have been selected by chance in two 50-member gene sets was approximately 4.4x10-10, suggesting that the identification of the 17q21 locus in the SAM-lists for both ER+ and ER- subtypes is significant (Supplementary Material). Taken collectively, these results suggest that for both the ER+ and ER- subtypes, the low-confidence breast tumors are significantly associated with increased expression of ERBB2 in comparison with the high confidence tumors, most likely resulting from DNA amplification of the 17q21 locus. We note, however, that the association between low-confidence prediction and ERBB2+ expression, although highly significant, is not perfect, as a few tumors that were designated as ERBB2+ by conventional IHC exhibited high-confidence predictions, while not all low-confidence tumors are ERBB2+. One possibility may be that other genes, besides ERBB2, may also contribute to a breast tumor exhibiting a low-confidence state.
To validate our finding, we then analyzed the other independently derived breast cancer expression datasets. First, of the nine ERBB2+ tumors in the Stanford data set, all nine were predicted as being in the low-confidence group (P<0.001, Table 1 and Supplementary Material). Second, in the Rosetta data set, we once again found a significant association between the confidence level of prediction and ERBB2 expression (P<0.001, Table 1 and Supplementary Material). Third, Gruvberger et al. (3) utilized artificial neural networks (ANNs) on a cDNA microarray data set of 28 ER+ and 30 ER- samples to predict the ER status of breast tumors. Their results, shown in Figure 4B, depicts the output of the ANN model with sample standard deviations (SD), as assessed using the top 100 discriminator genes for ER subtype. Samples with a wide SD are analogous to the low-confidence status of the WV and SVM methodologies. As can be seen from Figure 4B, ERBB2+ samples (determined in Fig. 4A) tend to be associated with large SDs, which indicate high uncertainty, particularly for ER+ tumors. Taken collectively, the association between the confidence level of ER prediction and ERBB2 status was observed on a wide range of data sets originating from different laboratories utilizing different microarray technologies (Affymetrix, cDNA and oligonucleotide) on different patient populations (Asian, European/Caucasian), and predicted by different classification algorithms (WV, SVM, ANN). The commonality of these results on both our data set and publicly available data sets suggests that the correlation between high ERBB2 expression and low-confidence prediction status may be an inherent feature of breast cancer in general.
|
A significant proportion of genes perturbed in the low-confidence samples are not known to be regulated by estrogen and lack potential EREs in their promoters
The strong correlation between high ERBB2 levels and the widespread perturbations of ER-subtype discriminating genes observed in the low-confidence tumors raises the possibility that ERBB2 may functionally contribute towards this phenomenon. One possible mechanism by which this could occur is through ERBB2 signaling which has been proposed to inhibit the transcriptional activity of ER (see Discussion). Under this scenario, one might expect that a significant proportion of the genes perturbed between the high-confidence (ERBB2-) and low-confidence (ERBB2+) tumors would consist of genes regulated by ER. We tested this hypothesis in two ways. First, we compared our list of significantly perturbed genes (Table 2) to SAGE expression data derived from estrogen (E2) stimulated MCF-7 cells (13) to determine if the extent of overlap between the two. Only two genes (STC2, TFF1) were found in common between the SAGE data and the perturbed gene list, and one (TFF1) was regulated in the opposite manner from that expected, exhibiting higher expression in the ERBB2+ samples. This result, within the limits of the cell line assay, suggests that many of the perturbed genes in the low-confidence tumors may not be directly regulated by estrogen.
Second, as in vitro cell-line studies may not fully recapitulate the effects of estrogen in vivo, we then adopted a bioinformatic approach using a recently described algorithm, Dragon Estrogen Response Element Finder (DEREF), to search for putative EREs in the promoter regions of the perturbed genes (14). The prediction accuracy of DEREF has been validated in a number of in vivo examplesit detects ERE patterns 2.8x more frequently in the promoter regions of estrogen responsive versus non-responsive genes in a microarray experiment, and 5.4x more frequently in the promoters of genes belonging to the estrogen-induced SAGE dataset versus genes whose expression is negatively correlated to ER in breast cancers (see Supplementary Material for a more extensive characterization of DEREF). Of the top 50 perturbed genes in the ER+ tumors (Table 2), the transcriptional start sites of 35 could be accurately determined and thus were subsequently analyzed by DEREF. Of this 35, EREs were detected with high-confidence in only 12 promoters (total frequency 34%) (Table 2). Conversely, of the top 50 perturbed genes in the ER- tumors, 33 were analyzed by DEREF and high-confidence EREs were detected in only three (total frequency 9%; Table 2). Thus, EREs were detected in the promoters of perturbed genes in ER+ tumors at 3.7x higher frequency than in the ER- tumors. This difference was significant by a chi-square analysis (P=0.012), suggesting that ERBB2 may affect transcription in ER+ and ER- tumors via distinct mechanisms (see Discussion). Regardless, EREs were not detected as overrepresented in the perturbed genes in both subtypes (ER+ and ER-), suggesting that these genes may not be direct transcriptional targets of ER. These genes may represent either indirect targets of ER, or may be transcriptionally regulated via ER-independent mechanisms.
| DISCUSSION |
|---|
|
|
|---|
There has been an intense interest in the use of gene expression profiles for biological classification, particularly in the fields of oncology and medicine. Proposed advantages of the profiling approach include the followingfirst, expression profiles can potentially define clinically relevant subtypes of cancer that have previously eluded more conventional approaches such as light-microscopy and IHC (15,16). Second, in contrast to single molecular markers, the ability to simultaneously monitor multiple genes can often provide a useful insight into the activity state of clinically significant cellular and tumorigenic pathways. Third, depending on the scoring pathologist, results from IHC may sometimes be misleading due to the presence of isolated aberrant regionssuch a situation may occur when a tumor is designated ER+ due to a small area of the tumor exhibiting intense ER staining, while the remainder/majority of the tumor remains devoid of such staining. In contrast, because expression profiles are usually derived from the bulk of the tumor, they may better represent the overall collective biology of the composite tumor. Despite this potential, a number of issues have to be resolved before the use of gene expression data for clinical diagnosis can become a reality. For example, algorithms need to be implemented that, besides delivering the correct classification, can also accurately determine the confidence of the prediction. This is particularly important if the classification affects the subsequent course of treatmentif furnished with such information, the treating physician can then weigh the confidence of prediction with the potential morbidity of a specific intervention to make an informed clinical choice.
The findings in this report complement and extend the previous work in this area related to the classification of breast tumors by ER subtype. In general, these studies have shown that, while gene expression data can be successfully used to classify the ER subtype of most tumors, there invariably exists a certain population of tumors that exhibit a low-confidence of prediction and thus cannot be accurately classified (3,4). Since these prior studies did not extensively investigate these low-confidence samples, we performed an in-depth analysis of these low-confidence tumors and made a number of findings. We found that, in comparison with patients with high-confidence tumors, patients with low-confidence tumors exhibited a significantly worse overall survival and shorter time to distant metastasis. The high- versus low-confidence classification, arrived at by computational analysis of gene expression profiles, also served to separate ER+ tumors into groups exhibiting distinct clinical behaviors (Fig. 2). Notably, the association of the low-confidence category with adverse outcome was observed using data from two independent studies (Stanford and Rosetta), supporting the hypothesis that the low and high-confidence tumors are indeed clinically distinct. Since the discernment of such subgroups is currently not possible using conventional immuno-histopathological techniques, these results also demonstrate how the classification of a breast tumor's ER status by expression profiling and computational analysis can be medically useful.
We also made the unanticipated finding that the low-confidence state is significantly associated with elevated expression of the ERBB2 receptor. We emphasize that the connection between ERBB2 and low-confidence predictions remains an association, and that at this point we have no evidence (from our own data) that ERBB2 is functionally responsible for causing the low-confidence state. Nevertheless, given that ER and ERBB2 are currently the two most clinically relevant molecular biomarkers in breast cancer, it is tempting to speculate that these results suggest that there may exist substantial cross-talk between these two signaling pathways in breast cancer, a possibility that has also been proposed by others (7). Interestingly, in a separate analysis, we have also found that there is a significant negative association between ER+ status and ERBB2 expression for our in-house and the Stanford data set, but not for the other two datasets (Supplementary Material). We also note that the association between ERBB2+ and low-confidence prediction, although highly significant, is not perfect, as a few ERBB2+ tumors were also found to exhibit high-confidence predictions, while not all low-confidence tumors are ERBB2+. Thus, it is unlikely the low-confidence population of breast tumors could have been discerned by conventional histopathological techniques used to detect ERBB2 such as IHC and FISH. Instead, we speculate that, for tumors designated ERBB2+ by routine histopathology, further examination of these tumors for the presence of such characteristic expression perturbations may be a promising method to distinguish between tumors that are likely to be more clinically aggressive versus those that will progress along a comparatively more indolent course. Exploring this possibility will be an important task for future research.
Clinically, elevated ERBB2 expression in ER+ breast tumors has long been associated with decreased sensitivity to anti-hormonal therapies, and a recent population-based study, using more conventional analytical methods, has also found that ERBB2 is epistatic to ER status for disease prognostication (17). A number of experimental papers have also been reported addressing possible mechanisms by which ERBB2 activity might cause this effect. In general, the most popular model has been one in which elevated ERBB2 signaling causes ER to exhibit diminished transcriptional activity, either through transcriptional down-regulation of the ER gene (18), post-translational modifications of ER (e.g. phosphorylation) (19), or via induction of ER-binding corepressors such as MTA1 (20). If the effects of ERBB2 were mediated primarily through effects on ER transcriptional activity, then one might expect that a substantial number of the genes whose transcription is significantly perturbed in the ERBB2+ low-confidence samples should correspond to genes which are direct targets of ER. We found, however, that a significant proportion of the genes that were significantly perturbed in both ER+ and ER- tumors have not been previously identified as estrogen-induced genes, and these genes also appear to lack potential EREs in their promoters. This is particularly the case in the ER- tumors, in which only 9% of the significantly perturbed genes were found to contain high-confidence putative EREs in their promoters. Although we cannot rule out the possibility that these perturbed genes may be indirect targets of ER or may be activated by ER via non-ERE mechanisms, these findings raise the possibility that ERBB2 activity may regulate a significant fraction of genes in breast tumors in an ER-independent fashion. There are numerous avenues by which this could occur. For example, ERBB2 might regulate other transcription factors besides ER through activation of the RAS/MAPK or PI3/Akt pathways (19). Alternatively, ERBB2 activity may result in the induction of chromatin remodeling factors such as MTA1 that may cause more pleiotropic effects (20). Our findings suggest that it may be important to perform further research along these other lines as well, in order to fully understand the factors that contribute towards the low-confidence subtype of breast tumors.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Breast tissue samples and patient data
Breast tissue samples and clinical data were obtained from the Tissue Repository in the institution National Cancer Center of Singapore, after appropriate approvals had been obtained from the institution's Repository and Ethics Committees. Samples were grossly dissected in the operating theater immediately after surgical excision, and flash-frozen in liquid N2. Histological information (ER, ERBB2) was provided by the Department of Pathology at Singapore General Hospital, and samples were selected to provide a comparable number of ER+ and ER- tumors (as determined by IHC) for each data set. Tumor samples contained >50% tumor content as assessed by cryosections. Fifty-five tumors (35 ER+ samples and 20 ER- samples), was used as training data, while a separate set of 41 tumors (21 ER+ and 20 ER- samples) was used for blind testing. A detailed list of all samples and clinical data for the patient is included in Table S1 in the Supplementary Material.
Sample preparation and microarray hybridization
RNA was extracted from tissues using Trizol reagent and processed for Affymetrix Genechip hybridizations using U133A Genechips according to the manufacturer's instructions.
Data preprocessing
Raw chip scans were quality controlled using the GeneData Refiner program and deposited into a central data storage facility. The expression data was pre-processed by removing genes whose expression was absent throughout all samples (i.e. A calls), subjecting the remaining genes to a log2 transformation, and mediate-centering by samples.
Prediction of ER status
Two classification algorithms, WV (21) and SVMs (22), were used to classify breast tumors according to ER subtype. Classification accuracy is defined as the number of correctly classified samples divided by the total number of samples. For the WV analyses, classification accuracy was determined using a gene set of the top 50 discriminating genes for ER status, while the SVM-based binary classifier utilized all genes.
Weighted voting.
The weighted voting algorithm utilizes a signal-to-noise (S2N) metric to perform binary classifications. Each gene belonging to a predictor set is assigned a vote, expressed as the weighted difference between the gene expression level in the sample to be classified and the average class mean expression level. Weighting is determined using the correlation metric
![]() |
denote means and standard deviations of expression levels of the gene in each of the two classes. The ultimate vote for a particular class assignment is computed by summing all weighted votes made by each gene used in the class discrimination. The prediction strength (PS) is defined as
![]() |
Support vector machine.
Support vector machines are classification algorithms which define a discrimination surface in the utilized feature (gene) space that attempts to maximally separate classes of training data (22). An unknown test sample's position relative to the discrimination surface determines its class. Distances are usually calculated in the n-dimensional gene space, corresponding to the total number of gene expression values considered. We used SVM-FU (available at www.ai.mit.edu/projects/cbcl/) with the linear kernel to implement the SVM analysis. The confidence of each SVM prediction is based on the distance of a test sample from the discrimination surface, as previously described (23).
Identification of low-confidence tumors
Owing to the clinical importance of achieving good prediction confidence, we conservatively chose a series of high confidence thresholds to minimize potential false positive classifications. A tumor sample was assigned to the low-confidence category if its PS from WV was less than this threshold. For each of the different data sets, the threshold selected was derived from the leave-one-out cross validation (LOOCV) results (Fig. 1 and Supplementary Material) at points where the ER+ and ER- tumors demonstrated qualitatively reduced prediction strengths compared with the majority of tumors. For our in-house data set, the threshold was determined from the training set only. This analysis led to thresholds of 0.4, 0.4 and 0.7 for our in-house, Stanford and Rosetta data sets respectively, corresponding to low-confidence proportions of 16.7, 18.9 and 15.4% of the total number of tumors. Similar results were obtained for all data sets when the threshold cut-offs were varied by ±10% (Supplementary Material).
Selection of differentially expressed genes and determination of expression perturbations
SAM is a statistical methodology developed to identify genes that are differentially expressed between separate groups (11). Genes are ranked are according to their statistical likelihood of being regulated. The ranking metric is the S2N ratio, which is similar to WV. Thus, genes identified as being highly regulated by SAM will also contribute highly towards the WV discriminator. The SAM algorithm also performs a permutation analysis of the expression data to estimate the number of genes identified as being differentially regulated by random chance (i.e. false positives). This number is the false discovery rate. Depending upon the desired stringency, different reports have used FDRs ranging from <5 to 33% (24,25). Student's t-test was used to compare levels of expression in the SAM-133 gene set between high- and low-confidence groups. A gene was classified as exhibiting significant perturbed expression if its P-value was less than 0.05.
Computational identification of EREs using DEREF
A computational algorithm, DEREF (14), was used to identify putative EREs, which are DNA binding sites of ER within promoters (for a description of the underlying methodology of DEREF, as well as sensitivity and specificity metrics, see http://sdmc.lit.org.sg/ERE-V2/index and the Supplementary Material). On the default setting, DEREF produces on average one ERE pattern prediction per 13 000 nt on human genomic DNA, with a sensitivity of 83%. To reduce the number of false positives, we applied in this report an additional criteria that a predicted ERE pattern of 17 nucleotides (14) also had to match [based on BLAST (26) matching without allowed gaps] a similar ERE pattern from at least one other human gene promoter, under conditions where the latter pattern could be predicted by DEREF at a sensitivity of 97%. The ERE searches in this report were performed against a database of
11 000 reference human promoter sequences covering the range [-3000, +1000] relative to the 5' end of the gene, which was generated using the FIE2 program (27,28). Some genes to be analyzed were not contained in this promoter database, and the ERE searches for these genes were thus not performed. Such genes are denoted in Table 2 by N/A.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at HMG Online.
| ACKNOWLEDGEMENTS |
|---|
P.T. thanks Hui Kam Man for his encouragement and support. This work was supported by a grant from Agenica Research.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +65 64368345; Fax: +65 62265694; Email: cmrtan{at}nccs.com.sg
| REFERENCES |
|---|
|
|
|---|
- Tavassoli, F.A. and Schnitt, S.J. (1992) Pathology of the Breast. Elsevier, Oxford.
- Biswas, D.K., Averboukh, L., Sheng, S., Martin, K. Ewaniuk, D.S., Jawde, T.F., Wang, F. and Pardee, A.B. (1998) Classification of breast cancer cells on the basis of a functional assay for estrogen receptor. Mol. Med., 4, 454467.[Web of Science][Medline]
-
Gruvberger, S., Ringner, M., Chen, Y., Panavally, S., Saal, L.H., Borg, A., Ferno, M., Peterson, C. and Meltzer, P. (2001) Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. Cancer Res., 61, 59795984.
[Abstract/Free Full Text] -
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A. Jr, Marks, J.R. and Nevins, J.R. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl Acad. Sci. USA, 98, 1146211467.
[Abstract/Free Full Text] - Pietras, R.J., Arboleda, J., Reese, D.M., Wongvipat, N., Pegram, M.D., Ramos, L., Gorman, C.M., Parker, M.G., Sliwkowski, M.X. and Slamon, D.J. (1995) HER-2 tyrosine kinase pathway targets estrogen receptor and promotes hormone-independent growth in human breast cancer cells. Oncogene, 10, 24352446.[Web of Science][Medline]
- Kurokawa, H. and Arteaga, C.L. (2001) Inhibition of erbB receptor (HER) tyrosine kinases as a strategy to abrogate antiestrogen resistance in human breast cancer. Clin. Cancer Res., 12, 4436s4442s.
- Bange, J., Zwick, E. and Ullrich, A. (2001) Molecular targets for breast cancer therapy and prevention. Nat. Med., 7, 548552.[CrossRef][Web of Science][Medline]
- Chia, K.S., Seow, A., Lee, H.P. and Shanmugaratnam, K. (2000) Cancer Incidence in Singapore, 19931997. Singapore Cancer Registry, Singapore.
-
Sorlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S. et al. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. USA, 98, 1086910874.
[Abstract/Free Full Text] - Van 't Veer, L.J., Dai H, van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530536.[CrossRef][Medline]
-
Tusher, V.G., Tibshirani, R. and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, 98, 51165121.
[Abstract/Free Full Text] -
Kallioniemi, A., Kallioniemi, O.P., Piper, J., Tanner, M., Stokke, T., Chen, L., Smith, H.S., Pinkel, D., Gray, J.W. and Waldman, F.M. (1994) Detection and mapping of amplified DNA sequences in breast cancer by comparative genomic hybridization. Proc. Natl Acad. Sci. USA, 91, 21562160.
[Abstract/Free Full Text] -
Charpentier, A.H., Bednarek, A.K., Daniel, R.L., Hawkins, K.A., Laflin, K.J., Gaddis, S., MacLeod, M.C. and Aldaz, C.M. (2000) Effects of estrogen on global gene expression: identification of novel targets of estrogen action. Cancer Res., 60, 59775983.
[Abstract/Free Full Text] -
Bajic, V.B., Tan, S.L., Chong, A., Tang, S., Strom, A., Gustafsson, J., Lin, C.Y. and Liu, E. (2003) Dragon ERE Finder version 2: a tool for accurate detection and analysis of estrogen response elements in vertebrate genomes. Nucl. Acids Res., 31, 36053607.
[Abstract/Free Full Text] - Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X. et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503511.[CrossRef][Medline]
- Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., Radmacher, M., Simon, R., Yakhini, Z., Ben-Dor, A. et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536540.[CrossRef][Medline]
-
Joensuu, H., Isola, J., Lundin, M., Salminen, T., Holli, K., Kataja, V., Pylkkanen, L., Turpeenniemi-Hujanen, T., Von Smitten, K. and Lundin, J. (2003). Amplification of erbB2 and erbB2 expression are superior to estrogen receptor status as risk factors for distant recurrence in pT1N0M0 breast cancer: a nationwide population-based Study. Clin. Cancer Res., 9, 923930.
[Abstract/Free Full Text] - Grunt, T.W., Saceda, M., Martin, M.B., Lupu, R., Dittrich, E., Krupitza, G., Harant, H., Huber, H. and Dittrich, C. (1995). Bidirectional interactions between the estrogen receptor and the cerbB-2 signaling pathways: heregulin inhibits estrogenic effects in breast cancer cells. Int. J. Cancer, 63, 560567.[Web of Science][Medline]
- Stoica, G.E., Franke, T.F., Wellstein, A., Morgan, E., Czubayko, F., List, H.J., Reiter, R., Martin, M.B. and Stoica, A. (2003). Heregulin-beta1 regulates the estrogen receptor-alpha gene expression and activity via the ErbB2/PI 3-K/Akt pathway. Oncogene, 22, 20732087.[CrossRef][Web of Science][Medline]
- Mazumdar, A., Wang, R.A., Mishra, S.K., Adam, L., Bagheri-Yarmand, R., Mandal, M., Vadlamudi, R.K. and Kumar, R. (2000) Transcriptional repression of oestrogen receptor by metastasis-associated protein 1 corepressor. Nat. Cell Biol., 3, 3037.
-
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A. et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531537.
[Abstract/Free Full Text] - Vapnik, V. (1998) Statistical Learning Theory. Wiley, New York.
-
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P. et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures Proc. Natl Acad. Sci. USA, 98, 1514915154.
[Abstract/Free Full Text] -
Mueller, A., O'Rourke, J., Grimm, J., Guillemin, K., Dixon, M.F., Lee, A. and Falkow, S. (2003) Distinct gene expression profiles characterize the histopathological stages of disease in Helicobacter-induced mucosa-associated lymphoid tissue lymphoma. Proc. Natl Acad. Sci. USA, 100, 12921297.
[Abstract/Free Full Text] -
Sanoudou, D., Haslett, J.N., Kho, A.T., Guo, S., Gazda, H.T., Greenberg, S.A., Lidov, H.G.V., Kohane, I.S., Kunkel, L.M. and Beggs, A.H. (2003) Expression profiling reveals altered satellite cell numbers and glycolytic enzyme transcription in nemaline myopathy muscle. Proc. Natl Acad. Sci. USA, 100, 46664671.
[Abstract/Free Full Text] -
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucl. Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Chong, A., Zhang, G. and Bajic, V.B. (2002) Information and sequence extraction around the 5'-end and translation initiation site of human genes. In Silico Biol., 2, 461465.[Medline]
-
Chong, A., Zhang, G. and Bajic, V.B. (2003) FIE2: a program for the extraction of genomic DNA sequences around the start and translation initiation site of human genes. Nucl. Acids Res., 31, 35463553.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
S. Raulic, Y. Ramos-Valdes, and G. E DiMattia Stanniocalcin 2 expression is regulated by hormone signalling and negatively affects breast cancer cell viability in vitro J. Endocrinol., June 1, 2008; 197(3): 517 - 529. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. K.T. Tan, L. K. Tan, K. Yu, P. H. Tan, M. Lee, L. H. Sii, C. Y. Wong, G. H. Ho, A. W.Y. Yeo, P. K.H. Chow, et al. Clinical Validation of a Customized Multiple Signature Microarray for Breast Cancer Clin. Cancer Res., January 15, 2008; 14(2): 461 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Lin, S. Reierstad, C.-C. Huang, and S. E. Bulun Novel Estrogen Receptor-{alpha} Binding Sites and Estradiol Target Genes Identified by Chromatin Immunoprecipitation Cloning in Breast Cancer Cancer Res., May 15, 2007; 67(10): 5017 - 5024. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. E. Harvell, J. K. Richer, D. C. Allred, C. A. Sartorius, and K. B. Horwitz Estradiol Regulates Different Genes in Human Breast Tumor Xenografts Compared with the Identical Cells in Culture Endocrinology, February 1, 2006; 147(2): 700 - 713. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Siddique, C. Balram, L. Fiszer-Maliszewska, A. Aggarwal, A. Tan, P. Tan, K. C. Soo, and K. Sabapathy Evidence for Selective Expression of the p53 Codon 72 Polymorphs: Implications in Cancer Development Cancer Epidemiol. Biomarkers Prev., September 1, 2005; 14(9): 2245 - 2252. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. X. Jin, Y.-W. Leu, S. Liyanarachchi, H. Sun, M. Fan, K. P. Nephew, T. H.-M. Huang, and R. V. Davuluri Identifying estrogen receptor {alpha} target genes using integrated computational genomics and chromatin immunoprecipitation microarray Nucleic Acids Res., December 17, 2004; 32(22): 6627 - 6635. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Modlich, H.-B. Prisack, M. Munnes, W. Audretsch, and H. Bojar Immediate Gene Expression Changes After the First Course of Neoadjuvant Chemotherapy in Patients with Primary Breast Cancer Disease Clin. Cancer Res., October 1, 2004; 10(19): 6418 - 6431. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











