Datasets


Result from MMTV-induced tumor study

In order to test whether genetic backgrounds affect the gene expression in MMTV-induced mammary tumors and to identify differential expression genes, three mice from each of four mouse strains, HeN, Hej, YbR, and BALB, were compared to a standard RNA reference, Strategene reference, for gene expression profiling using Ontario mouse 15k microarrays. Each mouse is compared with the reference in a dye-swap fashion, therefore, the whole experiment contains 12 dye-swap pairs on 24 slides.

After hybridization, the slides were scanned using an Axon scanner and the 16-bit tiff images were gridded using SPOT (CSIRO Mathematical and Information Sciences). The background(morph) subtracted mean values are log-transformed and pre-normalized using local lowess to remove spatial and intensity related biases. The data were further analyzed using MAANOVA 1.2 to remove the dye and spot effects and to estimate the relative expression level of each variety at each gene y=ยต+A+D+V+e.

Three types of F tests were used to identify genes that express differently among the mouse strains by comparing the alternative model(which allows each strain a unique variety ID) with the null model(which assigns all mice with the same variety ID). F1 is similar to T-test, which assumes that each gene has a unique error distribution. F3 is a statistically formalized version of fold change to capture the magnitude of expression difference. It assumes that all the genes on the arrays have the same error distribution. The F2 is the hybrid of F1 and F3. It combines one half of the gene-specific error and one half of the mean error across all arrays. Permutation of residuals of anova model was employed to establish the statistic significance of F2 and F3 to avoid distribution assumptions. The volcano plot (Figure 1) shows that 42 spots are significantly different according to the criteria that tabulated P value of F1 (F1.Ptab) is less than 0.001, the multiple-test-orrected P value of F2 (F2.Pvalmax) is less than 0.05, and the multiple-test-corrected P value of F3 (F3.Pvalmax) is less than 0.05. Using all three F tests guarantees a certain magnitude of expression difference and the reproducibility of the selected genes.

 

Figure 1. Volcano plot for F test results

Because the Ontario mouse 15k arrays are double-spotted for each gene, only the genes with both replicate spots selected by the above criteria are considered as top candidates, which is shown in Gene List 1.

Spot Index
GenBank ID
Gene Description

GROUP 1

18282

BG078291

"Mus musculus cyclin D2 (Ccnd2), mRNA"

GROUP 2

8604

BG065626

22674

BG075676

30596

BG063668

GROUP 3

15180

BG067341

18048

BG074047

21678

BG067620

22630

BG074388

23620

BG067670

24652

BG076041

27738

AU046252

28830

BG067439

30064

BG066678

30482

BG075407

30798

BG068331

GROUP 4

8166

BG071318

16802

BG075190

23780

BG071239

GROUP 5

21122

BG070089

"Mus musculus tumor-associated calcium signal transducer 1 (Tacsd1), mRNA"

Not in any group

21564

BG065396

Homo sapiens mRNA; cDNA DKFZp586L2123 (from clone DKFZp586L2123)

Gene List 1. Significant Genes for all three Ftests (F1.Ptab<0.001, F2.Pvalmax<0.05; F3.Pvalmax<0.05)

Using K-means clustering for the gene expression, e.g., VG value, these 20 genes are clustered into 5 groups. The VG profiles for each group is shown in Figure 2.

 

Figure 2. VG profiles of the 20 genes

Since F2 test combines the strength of F1 and F3 against selecting genes with low reproducibility or genes with small expression differences, the full list of genes with F2.Pvalmax < 0.05, which include the 20 genes above and 42 more genes. In the volcano plot (Figure 1), these genes are represented by red points. See Gene list 2 for the significant gene list from F2 test.

Spot Index
GenBank ID
Gene Description

1708

BG071503

2556

BG075927

2714

BG065308

Homo sapiens KIAA0396 mRNA, partial cds

4230

BG069868

5812

BG076059

5902

BG063471

Mus musculus fibrillarin (Fbl), mRNA

7576

BG085134

M.musculus Cd24a gene

7668

AW550650

Mus musculus t-complex testis expressed 1 (Tctex1), mRNA

8166

BG071318

8604

BG065626

10064

BG070224

11752

BG076877

Mouse creatine kinase B gene, complete cds

13354

BG070310

13880

BG067352

15180

BG067341

15730

BG065030

M.musculus GSHPx gene

16802

BG075190

17482

BG075397

Mouse CFh locus, complement protein H gene, complete cds, clones MH(4,8)

18048

BG074047

18282

BG078291

Mus musculus cyclin D2 (Ccnd2), mRNA

18672

BG086330

Mus musculus microsomal glutathione S-transferase (Gst), mRNA

19222

AU040587

20004

BG073920

Mus musculus lactate dehydrogenase 2, B chain (Ldh2), mRNA

20278

BG078506

Homo sapiens mRNA; cDNA DKFZp566G2246 (from clone DKFZp566G2246)

20546

BG071905

Homo sapiens mRNA; cDNA DKFZp586L0518 (from clone DKFZp586L0518)

20650

AW552541

20920

BG078804

Mus musculus BMP-4 gene, complete cds

20992

BG067264

21010

BG080888

Homo sapiens cDNA FLJ13397 fis, clone PLACE1001351

21122

BG070089

Mus musculus tumor-associated calcium signal transducer 1 (Tacsd1), mRNA

21330

BG074398

Mus musculus extracellular matrix protein 2 (Ecm2), mRNA

21564

BG065396

Homo sapiens mRNA; cDNA DKFZp586L2123 (from clone DKFZp586L2123)

21678

BG067620

22070

BG075854

M.musculus ufo mRNA

22162

BG077665

Mus musculus low density lipoprotein receptor related protein (Lrp), mRNA

22232

BG066232

Mus musculus high mobility group protein I, isoform C (Hmgic), mRNA

22630

BG074388

22674

BG075676

22806

BG064504

23442

BG063693

M.musculus gas5 growth arrest specific gene, exons 4-12

23484

BG078467

Mus spretus endogenous proviral sequence S3

23550

BG065738

Homo sapiens Rho guanine nucleotide exchange factor (GEF) 3 (ARHGEF3), mRNA

23620

BG067670

23658

BG080910

M.musculus of protein S gene, complete CDS

23780

BG071239

23794

BG071761

24652

BG076041

26108

BG065586

Homo sapiens cDNA FLJ13069 fis, clone NT2RP3001752

26254

BG068139

26470

BG073468

M.musculus DNA for alpha globin gene and flanking regions

27168

BG073624

27738

AU046252

27746

BG073049

28076

BG065049

Mouse pro-alpha1 (II) collagen chain gene, complete cds

28386

BG085072

Homo sapiens epididymis-specific, whey-acidic protein type, four-disulfide core; putative ovarian carcinoma marker (HE4), mRNA

28740

BG066208

28830

BG067439

28974

BG083987

Mouse (clone pIL2) B1 dispersed repeat unit

30064

BG066678

30482

BG075407

30596

BG063668

30798

BG068331

Gene List 2. Significant genes for F2 test

In order to identify genes that are differentially expressed among the mice without considering strains, an alternative model that gives every mouse a unique variety ID is compared to a null model that assigns all mice the same variety ID. F test comparing these two models identified 24 genes significant for all three types of F tests described above. Seven of these 24 genes belong to the gene list1 above. The expression of these 24 genes is used to cluster the 12 mice using a hierarchical cluster approach. The three BALB mice are clustered together. Two of the three YbR mice are clustered together. All the rest mice can not be clustered with high confidence except that mouse 2 of the HeN strain stands out alone, which indicates that the expression of this mouse is unusual.

 

Figure 3. Consensus tree from sample hierarchical clustering

F tests were conducted to identify the genes that are significantly different between mouse 2 of the HeN strain and the rest of the mice. First, mouse 2 was compared with the mean of all other mice in the experiment. Only three significant genes were identified to be significant according to all three types of F tests.

Spot Index
GenBank ID
Gene Description

17942

'BG071923'

'"Mus musculus myosin light chain 2 (Mlc2), mRNA"'

24302

'BG068317'

''

30270

'BG071387'

'Homo sapiens mRNA; cDNA DKFZp564A132 (from clone DKFZp564A132)'

Gene List 3. genes that are different between mouse 2 of strain HeN and the rest

The profile of VGs of these genes in all the mice is shown in Figure 4.

 

Figure 4. VG profile of mouse 2 versus the rest of the mice

Then, a different set of null and alternative models, which assign each mouse strain a unique variety ID and test the difference between mouse 2 and the other two mice in the group. Six genes were identified by all three F tests and F2 alone identified 28 genes. The VG profile and the gene list are shown in Figure 5 and Gene List 4.

 

Figure 5. VG profile of mouse 2 vs the rest of the mice - another test

Spot Index
GenBank ID
Gene Description

GROUP 1

19940

BG072209

"Mus musculus sulfated glycoprotein-2 isoform 1 mRNA, complete cds"

20808

BG063515

"Mus musculus ferritin heavy chain (Fth), mRNA"

GROUP 2

3808

BG087551

"Homo sapiens UDP-glucose pyrophosphorylase 2 (UGP2), mRNA"

14716

BG071468

16576

BG070063

"Homo sapiens cDNA: FLJ21267 fis, clone COL01717"

21226

BG072404

"Homo sapiens mRNA for KIAA0828 protein, partial cds"

21348

BG075211

"Mouse adipose differentiation-related protein (ADRP) gene, exons 1-8"

21482

BG077235

"Mus musculus nucleobindin 2 (Nucb2), mRNA"

22494

BG084593

"Mus musculus EIG-1 (Eig1), mRNA"

27788

BG073260

GROUP 3

5662

BG085427

"Mus musculus high mobility group protein 2 (Hmg2) gene, complete cds"

14114

BG072533

"Mus musculus heterogeneous nuclear ribonucleoprotein A1 (Hnrpa1), mRNA"

15830

BG067430

"Mus musculus H3 histone, family 3B (H3f3b), mRNA"

GROUP 4

8580

BG064947

14128

BG073336

14474

BG066300

14998

BG063486

16684

BG084836

"Human DNA for voltage-dependent calcium channel alpha1 subunit (CACN4), exon 48"

17662

BG065548

17664

BG065383

23092

BG083644

"Mus musculus P450 (cytochrome) oxidoreductase (Por), mRNA"

GROUP 5

12040

BG070902

"Mus musculus p8 protein (p8) gene, complete cds"

19280

BG084947

"Mus musculus p8 protein (P8-pending), mRNA"

GROUP 6

2824

BG067549

Mouse Ig germline kappa V-region gene V-kappa-24A

17942

BG071923

"Mus musculus myosin light chain 2 (Mlc2), mRNA"

21774

BG083088

"Mus musculus cyclin D1 (Ccnd1), mRNA"

24302

BG068317

30270

BG071387

Homo sapiens mRNA; cDNA DKFZp564A132 (from clone DKFZp564A132)

Gene List 4. genes that are different between mouse 2 of strain HeN and the rest - another test

Since each mouse is the true replicate unit in this experiment, we extracted the VG effect for each mouse and performed one-way-anova on these VGs to identify genes that differentially expressed among strains. Similarly, three F tests using different errors were constructed. The powers of these F tests are low because there are only 12 data points on each gene. No gene was identified using the multiple-text-corrected F2 and F3 even at 0.05 level and 28 genes were identified by F1 as Ptab<0.001, 10 of which belong to Gene list 1. The VG profiles of these 28 genes are shown in Figure 6.

 

Figure 6. VG profile of VGprofile_FtestVG5_8Ptab

Groups 3, 4, 6 and 7 have very small VG differences among the mouse strains, which is the reason for not being selected by the criteria of all three F tests in Fig1. However, they are significant according to F1 because the variation within each strain is also small.