Human Molecular Genetics, 2001, Vol. 10, No. 19 2133-2141
© 2001 Oxford University Press
Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests
CRIBI Biotechnology Centre and 1Department of Biology, University of Padova, via G. Colombo 3, 35131, Padova, Italy
Received May 31, 2001; Revised and Accepted July 9, 2001.
| ABSTRACT |
|---|
|
|
|---|
The comparison of several statistical methods currently used for detection of differentially expressed genes was attempted both by a simulation approach and by the analysis of data sets of human expressed sequence tags, obtained from UniGene. In the simulated mixed case, mimicking a situation close to reality, the general
2 test was unexpectedly the most efficient in multiple tag sampling experiments, especially when dealing with variations affecting weakly expressed genes. On the other hand, Audic and Claveries method proved the most efficient for detecting differences in gene expression when dealing with pairwise comparisons. By applying the above methods on UniGene-based data sets concerning two human kidney tumours compared with normal kidney tissue, three novel genes overexpressed in these tumours were identified. Software and additional information on statistical methodologies, simulation approach and data are available at http://telethon.bio.unipd.it/bioinfo/IDEG6/.
| INTRODUCTION |
|---|
|
|
|---|
Tissue-specific expression level of genes can be estimated by using the frequency of gene transcripts in unbiased cDNA libraries (1). According to this view, the study of genomic expression was attempted by serial analysis of gene expression (SAGE) (2) and, more recently, by cDNA array technology (3).
Each one of these methods is affected by limitations and biases: the array technology suffers from hybridization artefacts (e.g. cross-hybridization of closely related paralogous sequences), whereas SAGE is affected by sampling and sequencing errors, and by non-uniqueness and non-randomness of tag sequences (4). On the other hand, in silico comparison of expressed sequence tags (ESTs) frequencies among different unbiased cDNA libraries, proposed as an alternative approach (1,5,6), is strongly dependent on the availability of sufficiently large and unbiased cDNA libraries.
The basic problem, when dealing with the analysis of large amounts of expression data, is the efficiency of statistical tests used to detect which genes appear to be differentially expressed in different conditions.
About 5 years ago, Fishers exact test was proposed for evaluating differentially expressed genes by the digital differential display (DDD) tool at the Cancer genome Anatomy Project (CGAP) (7) website. The adequacy of this test statistic was contested by Audic and Claverie (8), who developed a more appropriate test for pairwise comparison of gene expression data in tag sampling experiments. Later, a review of theoretical and computational approaches for the identification of differential and coordinated gene expression was provided (9). Moreover, the development of statistical tools for the identification of differentially expressed genes when dealing with multi-conditional tag sampling experiments was recently attempted by different authors (8,10,11).
The aim of this paper is to compare the efficiency of the available methods by using both a simulation approach and the analysis of data sets obtained from the UniGene database.
| RESULTS |
|---|
|
|
|---|
Simulation approach
Simulation of genome expression was used to compare the efficiency of different test statistics in different experimental conditions. The detection of differentially expressed genes was attempted in a series of simulated data sets.
Expression levels are given by integers resulting from tag sampling techniques. The basic assumption is that the level of expression of a given gene may be inferred from the number of corresponding ESTs obtained from unbiased cDNA libraries. The random sampling inherent to this kind of analysis allows use of the Poisson distribution as a model of data structure.
Expression level matrices of 5000 genes per 10 libraries were generated. The expression levels associated to 3000 genes were obtained under the assumption of equal expression over all the 10 libraries, whereas for the additional 2000 genes differential expression in one or two libraries was assumed (Materials and Methods).
Expression level matrices, reflecting different gene expression situations, were generated and statistical tests were applied. Once selected a significance threshold of 0.001, we attempted to identify by each test differentially expressed genes.
Statistical tests
We applied the following tests: AC statistic (8), Fishers 2 x 2 exact test,
2 2 x 2 test statistic, R statistic (11), GT statistic (10), and the
2 test statistic for the analysis of the generated matrices. The efficiency of a given test statistic was assessed by recording the number of false negative and false positive cases, i.e. the percentage of differentially expressed genes which are not recognized by the test statistic and the percentage of non-differentially expressed genes which are recognized as positive.
Figures 1 and 2 summarize the results obtained with one and two outliers, respectively. Figure 3A summarizes the results of the mixed case with two outliers, while Figure 3B shows the results of the case with different number of total tags per library. In each figure, the percentage of false negative is plotted against the extent of difference in expression levels (gap-values), according to expression levels (
-values).
|
|
|
When dealing with highly expressed genes (with 94
106), for gap-values = 300, 200 and 100, all differentially expressed genes are detected by all test statistics. On the other hand, as the expression level decreases, the percentage of false negative considerably increases, even with large gap values. In particular, for weakly expressed genes (with 4
6), the average false negative percentage over all the considered gap values reaches 70%. This means that the 2-fold criterion, commonly used (12) for the identification of differentially expressed genes, is sufficiently sensitive only for highly expressed genes. In the case of highly expressed genes with one outlier, all test statistics seem to give similar results: the higher the gap-value, the better is the quality of the results. When the gap-value = 50, best results were obtained by the AC statistic, with 22% of false negatives. On the other hand, if the gap-value is <25, false negatives increase abruptly to almost 100% with all tests.
When dealing with moderate expression levels (20
50), the trend of all the tests is quite similar; however, best results are obtained by AC and GT statistics, whereas R statistic seems to miss the highest number of true positives. With weakly expressed genes (3
5), pairwise comparisons seem to be less effective than multi-comparison methods. In particular, the general
2 test misses the lowest number of differentially expressed genes (56% of false negative for
= 5 and gap = 15; >90% of false negative for
= 5 and gap-value <15; 77% of false negative for
= 3 and gap-value = 9;
100% of false negative for
= 3 and gap-value <9).
With two outliers over all the different expression levels, the general
2 test seems to be the most efficient, followed by the AC statistic. Also in this case, Fishers exact test and
2 test for two-way contingency table seem inadequate.
Results of the mixed case are similar for all the tests. The general
2 statistic is the most efficient, either with one (data not shown) or two outliers (for the maximum gap-value, 63% of false negative with one outlier and 44% with two outliers; Fig. 3A).
This can be partially due to the fact that in the mixed case most genes are weakly expressed and that the general
2 test is the most efficient test when dealing with low level of expression.
The R statistic shows quite a good performance with two outliers (for the maximum gap-value, 81% of false negative with one outlier and 57% with two outliers).
In the analysis of the simulated mixed case with different numbers of tags per library, which is the situation most close to reality, R and AC test statistics produce similar results (90% of false negative, on average); Fishers exact test and the
2 test for two-way contingency table have the highest percentage of false negative (92%); whereas again the general
2 seems to be the most efficient method (80% of false negatives). For this particular case, the average loss function, showing the trade-off between false positives and false negatives in relation to variation of the significance threshold, was calculated for the different tests (Fig. 4).
|
Software for the computation of all test statistics described (for users tag sampling data sets) is freely available to academics at the web site http://telethon.bio.unipd.it/bioinfo/IDEG6/.
Analysis of UniGene samples
By using a method published elsewhere (13), we reconstructed the expression profiles of genes transcribed in normal adult kidney and in two different kidney tumours, by using a collection of ESTs available in UniGene. The selected data sets were composed of 8320 ESTs and 4269 UniGene clusters (Lib.756, Lib.858, Lib.1009, Lib.5013) for normal kidney; 5006 ESTs and 2832 UniGene clusters (Lib.508) for kidney Wilms tumour and 5137 ESTs and 2581 UniGene clusters (Lib.565) for Adult, renal cell carcinoma.
The expression data of the reconstructed expression profiles were merged in a matrix of three tissue columns and 5719 genes rows. This expression data table was analysed by all the test statistics described above, in order to identify which genes appear differentially expressed between normal kidney tissue and one or both the considered kidney tumour. Description, expression data and results of statistical tests for each of these genes are reported in Table 1.
|
For all tests, the statistical significance was set to 0.05. For pairwise comparison tests, the significance level was corrected by the number of genes and of comparisons, according to the approximate Bonferroni correction (
= 3E-06), while the significance level for multi-comparison tests was corrected only by the number of genes (
= 9E-06). For the GT test, differential expression is assessed by values of a decision function. We used as thresholds for weak, moderate and strong evidence of differential expression, the same values proposed by Greller and Tobin (weak evidence, 0 < GT < 0.33; moderate evidence, 0.33 < GT < 0.66; strong evidence, 0.66 < GT < 1) in their original paper (10). Although the total number of genes in the sample was 5719, statistical analysis was restricted to 665 genes, which show a number of ESTs >4 in all the considered libraries. Out of the 665 considered genes, 34 (5.1%) were differentially expressed, according to at least one of all applied test statistics.
As expected, pairwise comparison methods appeared more conservative than multi-comparison tests. A fairly good agreement was observed between pairwise methods, except in the case of a gene detected only by the
2 2 x 2 test. In four cases, all the multi-comparison methods were concordant, whereas none of the pairwise comparison methods produced significant values.
For 16 out of 34 genes, all the statistical tests resulted in agreement if we also include three cases of weak evidence of differential expression according to the GT test. It is worth noticing that 12 genes were detected only by the general
2 test.
When considering specific genes, the expression level in one tumour does not seem significantly correlated with that in the other.
Among the 34 genes differentially expressed in at least one tumour tissue, two are involved in RNA splicing (heterogeneous nuclear ribonucleoprotein C and U5 snRNP-specific protein) and 12 in protein synthesis [nine genes coding for ribosomal proteins, two genes for translation initiation or elongation factors and for the poly(A)-binding protein]. All these genes appeared more expressed in tumour tissues than in the normal kidney.
An additional 20 genes appeared differentially expressed: two were found only in the normal kidney (glutathione peroxidase 3 and cathepsin K), five were found overexpressed in KT1, 11 in KT2, and two in both tumour tissues (Table 1).
| DISCUSSION |
|---|
|
|
|---|
The identification of differentially expressed genes in human tissues is relevant not only for its intrinsic biological significance, but also for discovering potential pharmaceutical targets and diagnostic or prognostic markers.
In this paper, we attempt to compare different test statistics for detecting differentially expressed genes in multi-condition tag sampling experiments, such as systematic sequencing of cDNA libraries or SAGE analysis. In all these cases, the analysis of data sets is complicated by the combination of different factors: different sample size, different expression levels, different extent and pattern of differential expression.
We applied to the same data sets (simulated and real UniGene data sets) test statistic proposed by Audic and Claverie (8), Stekel et al. (11) and by Greller and Tobin (10), the Fishers 2 x 2 exact test and the
2 test statistic.
As pointed out by Audic and Claverie (8), the Poisson distribution is the most adequate to describe tag sampling data. The use of this distribution for generation of data sets might give a slight advantage to AC and R statistics, which are based on the Poisson distribution assumption.
The simulation approach allowed us to evaluate separately and independently the performance of each test in specific situations. In particular, we generated data sets with highly, moderately or weakly expressed genes, data sets with a mixture of expression levels and data sets with a mixture of expression levels and different sample size per library.
False positives for each test under all simulated experimental conditions, at the selected significance threshold (0.001), were found to be always almost 0. As expected, the number of false negatives was strictly related to the extent of the simulated differential expression. The analysis of the trade-off of false positives and false negatives provides additional information on the behaviour of the test statistics. We compared the average loss function of the different tests in the mixed case of different expression levels and different sample size per library. Although the AC test shows very low loss for small criticality threshold, multiple
2 test and R appear always the most adequate (Fig. 4).
Differences in efficiency among statistical tests increase when analysing more complex data sets (mixed case with and without different sample sizes). Fishers exact test, used to analyse tag sampling experimental data in the Cancer Genome Anatomy Project (7) (http://www.ncbi.nlm.nih.gov/ncicgap/) seems inappropriate for the analysis of differentially expressed genes, as Audic and Claverie (8) already pointed out. The presence of the group all other genes in the contingency table is questionable, because the expression of a gene in all the considered conditions does not imply that the other genes are all necessarily shared among them. The same limitation applies to the
2 2 x 2 test. In addition, Fishers 2 x 2 exact test requires that the marginal frequencies in both margins are fixed a priori. In the case under consideration, column margins may be considered fixed (total number of tags) but row margins depend on the intrinsic gene expression level, which is a priori unknown. Different from
2 2 x 2 and Fishers exact test, the AC statistic is based on the most appropriate probabilistic model.
According to our observations, the GT statistic is useful for the detection of single outliers, but unfortunately useless when analysing more complex expression patterns.
It is noteworthy that a classical test statistic such as the general
2 test appears as the most efficient for detecting differentially expressed genes. On the other hand, R statistic may be very useful when searching for genes differentially expressed in one or more cases (tissues, conditions etc.), but especially when considering mostly highly or moderately expressed genes.
2 and R statistics are asymptotically equivalent. However, the adequacy of the
2 approximation depends on the total number of ESTs over all the libraries and on the number of cells of the contingency table. Larntz (14), Koehler and Larntz (15) and Koehler (16) showed that
2 is valid with smaller sample sizes and more sparse table than the likelihood-ratio
2 test (R). They also showed that when most expected frequencies are <5, although both test statistics do not provide a good approximation, R distribution is more conservative than
2. Therefore it is expected to fail to identify some differences.
According to our results, the general
2 test and the AC statistic seem able to detect the most relevant information on differentially expressed genes, even when dealing with small numbers, though with expected values >5.
From a general point of view, the definition of differentially expressed genes is not intrinsically linked to a specific extent of the difference, whereas the statistical treatment of data implies the selection of significance thresholds. On the other hand, the detection of smaller and smaller variations of expression depends on the sample size, with no theoretical limit. It can be noticed that studies focusing on comparison of expression data for a large number of genes, typically lead to the identification of a small number of genes showing the most striking differential expression (17). From a biological point of view, on the contrary, relatively small changes in expression levels may also be highly significant in some instances.
When applying the above described test statistics to compare the in silico reconstructed expression profiles of normal adult human kidney and of two different renal tumours (http://telethon.bio.unipd.it/bioinfo/IDEG6/) 34 genes resulted as differentially expressed in normal and cancerous kidney tissues. Most of them have already been reported to be differentially expressed in kidney tumours and/or in other cancer tissues. Interestingly, the gene coding for glutathione peroxidase, the activity of which is reportedly decreased in human kidney tumours (18), appears silenced in both the tumour tissues considered here.
Three collagen genes (collagen, type I,
1; collagen, type III,
1; collagen, type III,
1) and genes coding respectively for connective tissue growth factor, fibronectin 1, tenascin C and matrix Gla protein, appear highly expressed in the two renal tumours considered in this study. Actually, components of the extracellular matrix are known to regulate the migration and the invasion of renal carcinoma cells (19) and to influence proliferation, differentiation and morphogenesis in kidney tumours (20). Four out of nine ribosomal protein genes (RP3a, RPL7, RPS18 and RPP0), highly expressed in the two considered renal tumours, are reportedly overexpressed in tumour tissues (2125). The present study succeeded in detecting three novel genes, which appeared differentially expressed in normal kidney versus at least one tumour tissue. These genes (PRO2047, PRO2605 and FLJ10814) may become candidate new tumour markers and/or possible targets for chemotherapy.
According to the statistical theory about multiple comparisons, the repeated use of pairwise tests on a large number of libraries is inappropriate. In general, when dealing with more than two libraries only the
2, R and GT test statistics should be used. However, there are situations in which the joint use of single and pairwise tests may give useful information about the hypothesis testing. In particular, if we want to test the equality of the gene expression data of each disease state versus the control (normal condition), R and
2 would select also those genes with different expression levels among the diseases.
Whereas sophisticated software is presently proposed for the analysis of differential gene expression, this study suggests that the most efficient test statistics are the general
2 test for multiple comparisons and the Audic and Claverie test for pairwise comparisons. The combined application of these two methods is probably the most adequate solution to the problem of detecting differentially expressed genes in multiple tag sampling experiments with cDNA libraries.
| MATERIALS AND METHODS |
|---|
|
|
|---|
In the simplest experimental situation the estimated gene expression levels are compared between two conditions (conditions A and B). Estimated expression levels are integer numbers resulting from random ESTs sampling techniques. Therefore, the data distribution follows a binomial distribution with parameter P and N, where P is the probability of observing x number of tags for a gene, sampling in total N cDNA clones randomly. When P < 5% and N > 1000, the binomial distribution tends to a Poisson distribution, with parameter
= NP. Then,

where
is the actual number of tags of this type per N clones in the library.
Equally expressed genes will have statistically equal values of
; on the other hand, differentially expressed genes will show highly statistically different values of
. It is possible to simulate real-like expression matrices, by generating expression values from a Poisson distribution with a given value of
.
Gene expression analysis may be summarized in a matrix-like scheme where rows are genes, columns are libraries and the values in the cells of the matrix correspond to the expression levels of genes in libraries.
We generated the expression levels of 5000 genes over 10 libraries: 3000 under the hypothesis of equal expression levels over all the different libraries (mean cases), and 2000 under the hypothesis of different expression levels in one or more libraries (outlier cases).
Genes equally expressed in all libraries (mean cases)
For high values of
, Poisson distribution tends to a Gaussian distribution with
mean and variance. Therefore, it is possible to calculate a 95% confidence interval of
as the range of
-values that are not significantly different.
Each expression level of the 3000 genes equally expressed was independently generated from a Poisson distribution, with
-values randomly sampled from the 95% confidence interval (CI).
Differentially expressed genes (outliers cases)
Let lmin and lmax be the two extreme values of the confidence interval described previously. Let the variable gap (extent of the difference between expression levels) be the difference between a selected value outside the interval (
out) and lmax, representing the degree of differential expression. Values corresponding to differentially expressed genes were generated from a Poisson distribution with
-values greater than lmax. Each matrix corresponds to a different gene expression level and a different gap level.
Since we sampled from a Poisson distribution with a given
, we generated 70 matrices; all the 35 possible combinations of different expression and gap levels with 1 and 2 outliers (Table 2).
|
Usually, when considering the distribution of the number of tags per gene, a large number of genes have low expression levels and, on the other hand, very few genes are highly expressed. For this reason, we also generated libraries with genes highly, moderately and weakly expressed, according to the known distribution (13).
In detail, we obtained a data set that was the result of merging random subset of the previously simulated data (mixed data sets). The subset sizes were selected according to the reputed distribution of tags per gene.
The above described data generation procedure assumes similar total numbers of ESTs for each of the 10 simulated libraries. Usually, when comparing different real cDNA libraries, the total number of ESTs per library is different: to analyse the impact of this situation on the test statistics, three additional matrices were generated by EST random sampling from the previously generated mixed dataset.
Different statistical tests [Audic and Claverie (8); Stekel et al. (11); Greller and Tobin (10); Fishers exact test and
2 test] were implemented and applied on simulated data.
Further details about dataset, simulation procedure and test statistics described below are available online.
Audic and Claverie (8) developed a test statistic specifically adapted for tag sampling data. Assuming a probabilistic model (like Supplementary Material, equation 1) they calculated the probability of observing y number of ESTs in library A given that we have x number of ESTs in library B. The smaller this probability (Supplementary Material, equation 2) the more differentially expressed is the gene over the two libraries.
Given more than two libraries, the test statistic must be calculated for each possible pairwise comparison. Then, the obtained probability must be corrected for the total number of comparisons and for the total number of genes.
Fishers 2 x 2 exact test is suitable for testing the independence on two-way contingency table with multinomial sampling, and it is called exact because it does not use large-sample approximation distributions, but an exact distribution. Fishers exact test needs the generation of a two-way contingency table over each possible library comparison and over all genes (see Supplementary Material). Test results must be corrected for the number of comparisons and of genes.
Also the
2 can be used to test independence in contingency tables, but differently from Fishers exact test it uses an asymptotic distribution and may be used on any type of contingency tables, with any number of rows and columns and with only column margins to be fixed a priori.
We have included the
2 test either for two-way contingency table, or for the comparison of all libraries in the same table (general
2). Detailed information is available online as Supplementary Material.
The R statistic as in Stekel et al. (11) is a likelihood ratio test: statistic theory shows that R tends to a
2 distribution. Since the test is to be used repeatedly on many thousands of genes, they deliberately use a rank-like value obtained with a randomization approach. The rank value that is associated to a reliability value is dependent on the particular data set used. With that type of approach, we can apply the test over all libraries, eliminating the problem of multiple comparisons (see Supplementary Material).
Greller and Tobin (10) developed a robust technique to compare expression levels for more than two libraries based on the detection of markedly high or markedly low expression levels. At least 10 expression measurements per gene are required for a high reliability of the method.
Usual statistical analysis was done using R, free software (http://www.r-project.org). Generation of simulated matrices were developed using R language. For the calculation of the test statistics and for the counting of false positive and false negative genes we implemented two different C code programs (compiled on a SunOS unix machine with gcc compiler). The first, IDEG.6 (freely available for academics on http:// telethon.bio.unipd.it/bioinfo/IDEG6/), takes the simulated expression matrix (in text format) as input calculates each of the above test statistics and gives as output, a text file where test values are arranged in a matrix-like scheme. The second, evaluation (freely available on request for academics), takes this last output file and calculates for each test statistic (for each column) the total number (over the 5000 simulated genes) of genes that have a significantly different expression level over the 10 libraries. The significance level was set at 0.001.
Since we know a priori which gene has different expression levels (in one or two libraries, outlier cases), it is possible to check the behaviour of the different test statistic according to the number of false positives (mean case genes that are detected as outliers) or false negatives (outliers that are detected as mean).
The four unbiased cDNA libraries of normal human kidney (Lib.756, Lib.858, Lib.1009 and Lib.5013) and two cDNA libraries prepared from different kidney tumour tissues (Lib.508, Wilms tumour; Lib.565, Adult, renal cell carcinomas) were selected and downloaded in order to obtain four different expression data sets, regarding normal and tumour kidney tissues.
In addition, two flat files of UniGene data (release #131 of February 28, 2001) were downloaded: the Hs.data file, containing all the data pertaining to all the UniGene clusters, and the Hs.seq.uniq file with the list of the sequences representative of UniGene clusters. Data were analysed by using novel PERL code software, developed in our laboratory (unpublished data). The expression profile of genes expressed in the considered tissue was built, by preparing a list of UniGene clusters for which at least one EST is represented in the set of cDNA libraries pertaining to the tissue and by retrieving all the information pertaining to the cluster from the original data files (UniGene cluster identification number, number of ESTs in the data set, percentage of the total detected transcription, gene symbol, Locuslink identification number, GenBank accession number of the sequence representative of the cluster and gene description).
The expression level of each gene in a given tissue was estimated by using the number of EST of the gene in the considered data set over the total number of ESTs of the tissue (13). Expression data of genes represented in each data set were merged in a matrix that was used as input to IDEG.6. Significantly differentially expressed genes over the three considered tissues were selected.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary material relating to this paper is available at http://www.hmg.oupjournals.org.
| ACKNOWLEDGEMENTS |
|---|
The authors are grateful to Professor Gerolamo Lanfranchi for critical reading and discussion. Many thanks also to Fabio dAlessi for technical support. The financial support of MURST to G.A.D. (Italian Ministry of University and Scientific and Technological Research) is acknowledged.
| FOOTNOTES |
|---|
+ To whom correspondence should be addressed. Tel: +39 049 8276215; Fax: +39 049 8276209; Email: danieli@bio.unipd.it
| REFERENCES |
|---|
|
|
|---|
1 Okubo, K., Hori, N., Matoba, R., Niiyama, T., Fukushima, A., Kojima, Y. and Matsubara, K. (1992) Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nat. Genet., 2, 173179.[Web of Science][Medline]
2 Velculescu, V.E., Zhang, L., Vogelstein, B. and Kinzler, K.W. (1995) Serial analysis of gene expression. Science, 270, 484487.
3 Phimister, B. (1999) Chipping forecast. Nat. Genet., 21, 1.
4 Stollberg, J., Urschitz, J., Urban, Z. and Boyd, C.D. (2000) A quantitative evaluation of SAGE. Genome Res., 10, 12411248.
5 Lee, N.H., Weinstock, K.G., Kirkness, E.F., Earle-Hughes, J.A., Fuldner, R.A., Marmaros, S., Glodek, A., Gocayne, J.D., Adams, M.D., Kerlavage, A.R. et al. (1995) Comparative expressed-sequence-tag analysis of differential gene expression profiles in PC-12 cells before and after nerve growth factor treatment. Proc. Natl Acad. Sci. USA, 92, 83038307.
6 Bortoluzzi, S. and Danieli, G.A. (1999) Towards an in silico analysis of transcription patterns. Trends Genet., 15, 118119.[Web of Science][Medline]
7 OBrien, C. (1997) Cancer genome anatomy project launched. Mol. Med. Today, 3, 94.
8 Audic, S. and Claverie, J.M. (1997) The significance of digital gene expression profiles. Genome Res., 7, 986995.
9 Claverie, J.M. (1999) Computational methods for the identification of differential and coordinated gene expression. Hum. Mol. Genet., 8, 18211832.
10 Greller, L.D. and Tobin, F.L. (1999) Detecting selective expression of genes and proteins. Genome Res., 9, 282296.
11 Stekel, D.J., Git, Y. and Falciani, F. (2000) The comparison of gene expression from multiple cDNA libraries. Genome Res., 10, 20552061.
12 Chen, Y.W., Zhao, P., Borup, R. and Hoffman, E.P. (2000) Expression profiling in the muscular dystrophies: identification of novel aspects of molecular pathophysiology. J. Cell. Biol., 151, 13211336.
13 Bortoluzzi, S., dAlessi, F., Romualdi, C. and Danieli, G.A. (2000) The human adult skeletal muscle transcriptional profile reconstructed by a novel computational approach. Genome Res., 10, 344349.
14 Larntz, K. (1978) Small-sample comparison of exact levels for chi-squared goodness-of-fit statistics. J. Am. Statist. Assoc., 73, 253263.[Web of Science]
15 Koehler, K. and Larntz, K. (1980) An empirical investigation of goodness-of-fit statistics for sparse multinomials J. Am. Statist. Assoc., 75, 336344.[Web of Science]
16 Koehler, K. (1986) Goodness-of-fit for log linear models in sparse contingency tables. J. Am. Statist. Assoc., 81, 483493.[Web of Science]
17 Backert, S., Gelos, M., Kobalz, U., Hanski, M.L., Bohm, C., Mann, B., Lovin, N., Gratchev, A., Mansmann, U., Moyer, M.P. et al. (1999). Differential gene expression in colon carcinoma cells and tissues detected with a cDNA array. Int. J. Cancer, 82, 868874.[Web of Science][Medline]
18 Di Ilio, C., Sacchetta, P., Angelucci, S., Zezza, A., Tenaglia, R. and Aceto, A. (1995) Glutathione peroxidase and glutathione reductase activities in cancerous and non-cancerous human kidney tissues. Cancer Lett., 91, 1923.[Web of Science][Medline]
19 Brenner, W., Gross, S., Steinbach, F., Horn, S., Hohenfellner, R. and Thuroff, J.W. (2000) Differential inhibition of renal cancer cell invasion mediated by fibronectin, collagen IV and laminin. Cancer Lett., 155, 199205.[Web of Science][Medline]
20 Lohi, J., Leivo, I., Oivula, J., Lehto, V.P. and Virtanen, I. (1998) Extra cellular matrix in renal cell carcinomas. Histol. Histopathol., 13, 785796.[Web of Science][Medline]
21 Chassin, D., Benifla, J.L., Delattre, C., Fernandez, H., Ginisty, D., Janneau, J.L., Prade, M., Contesso, G., Caillou, B. and Tournaire, M. (1994) Identification of genes overexpressed in tumors through preferential expression screening in trophoblasts. Cancer Res., 54, 52175223.
22 Lecomte, F., Szpirer, J. and Szpirer, C. (1997) The S3a ribosomal protein gene is identical to the Fte-1 (v-fos transformation effector) gene and the TNF-alpha-induced TU-11 gene, and its transcript level is altered in transformed and tumor cells. Gene, 186, 271277.[Web of Science][Medline]
23 Musholt, T.J., Goodfellow, P.J., Scheumann, G.F., Pichlmayr, R., Wells, S.A.,Jr and Moley, J.F. (1997) Differential display in primary and metastatic medullary thyroid carcinoma. J. Surg. Res., 69, 94100.[Web of Science][Medline]
24 Naora, H., Takai, I., Adachi, M. and Naora, H. (1998) Altered cellular responses by varying expression of a ribosomal protein gene: sequential coordination of enhancement and suppression of ribosomal protein S3a gene expression induces apoptosis. Cell Biol., 141, 741753.
25 Denko, N., Schindler, C., Koong, A., Laderoute, K., Green, C. and Giaccia, A. (2000) Epigenetic regulation of gene expression in cervical cancer cells by the tumor microenvironment. Clin. Cancer Res., 6, 480487.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Park, N. Sugimoto, M. D. Larson, R. Beaudry, and S. van Nocker Identification of Genes with Potential Roles in Apple Fruit Development and Biochemistry through Large-Scale Statistical Analysis of Expressed Sequence Tags Plant Physiology, July 1, 2006; 141(3): 811 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pylouster, C. Senamaud-Beaufort, and T. E. Saison-Behmoaras WEBSAGE: a web tool for visual analysis of differentially expressed human SAGE tags Nucleic Acids Res., July 1, 2005; 33(suppl_2): W693 - W695. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Wisser, Q. Sun, S. H. Hulbert, S. Kresovich, and R. J. Nelson Identification and Characterization of Regions of the Rice Genome Associated With Broad-Spectrum, Quantitative Disease Resistance Genetics, April 1, 2005; 169(4): 2277 - 2293. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. N. Robinson, U. Bohme, R. Lopez, S. Mundlos, and P. Nurnberg Gene-Ontology analysis reveals association of tissue-specific 5' CpG-island genes with development and embryogenesis Hum. Mol. Genet., September 1, 2004; 13(17): 1969 - 1978. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Bortoluzzi, C. Romualdi, A. Bisognin, and G. A. Danieli Disease genes and intracellular protein networks Physiol Genomics, November 11, 2003; 15(3): 223 - 227. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Beisson, A. J.K. Koo, S. Ruuska, J. Schwender, M. Pollard, J. J. Thelen, T. Paddock, J. J. Salas, L. Savage, A. Milcamps, et al. Arabidopsis Genes Involved in Acyl Lipid Metabolism. A 2003 Census of the Candidates, a Study of the Distribution of Expressed Sequence Tags in Organs, and a Web-Based Database Plant Physiology, June 1, 2003; 132(2): 681 - 697. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kadota, S.-I. Nishimura, H. Bono, S. Nakamura, Y. Hayashizaki, Y. Okazaki, and K. Takahashi Detection of genes with tissue-specific expression patterns using Akaike's information criterion procedure Physiol Genomics, February 6, 2003; 12(3): 251 - 259. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Romualdi, S. Bortoluzzi, F. d'Alessi, and G. A. Danieli IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments Physiol Genomics, January 15, 2003; 12(2): 159 - 162. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Ausubel Summaries of National Science Foundation-Sponsored Arabidopsis 2010 Projects and National Science Foundation-Sponsored Plant Genome Projects That Are Generating Arabidopsis Resources for the Community Plant Physiology, June 1, 2002; 129(2): 394 - 437. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








