Gene Expression Arrays

Overview
Analysis of variance for expression data
Experimental design for microarrays
Bootstrapping cluster Analysis
Design and analysis of microarrays
Statistical analysis of a gene expression microarray experiment with replication
Analysis of a designed microarray experiment

Gene expression technology has seduced the genomics community with its power and promise to unravel the genetic program. However, the experimental paradigms for this technology are not yet fully developed. For example, the response function (signal as a function of RNA concentration) has not been carefully studied for most technologies. There are many levels at which experimental error and noise can enter into the system.

We have focused our initial efforts on two problems.

First, is to estimate the baseline error variance in an array experiment and second, is to obtain normalization of results across multiple chips. We have been investigating classical experimental designs in the context of spotted cDNA arrays and have developed designs that solve both of these problems at a modest cost in additional chips. Classical design concepts have not previously been applied to gene expression technology. The key observation here is to recognize that the paired comparisons in two-dye systems imposes an incomplete block structure on the experiment.

We have chosen to pursue the spotted clone array technology for a number of reasons, including cost and flexibility, and have begun construction of our own chips designed to include replication at all levels of the experiment within and across chips. The first planned applications are some simple one and two factor experiments based on our new experimental designs applied to a study of liver tissue from normal and diabetic mice on drug treatments. If costs can be driven down, it may be possible to map the loci responsible for variations in expression in an F2 or backcross population.

In anticipation of a flood of data from our own chips and perhaps also from outside collaborators, we are developing a database prototype. Efficient databases will be essential to the development and application of analysis tools. Our first priority will be to establish baseline quality measures for expression data and then to develop a flexible storage and retrieval system. Further developments of analysis methods will be driven by the biological questions that motivated our experiments. In the case of data mining, involving multiple comparisons, we will address these issues using permutation analysis.