Other Resources

Lynx Therapeutics
Funding agencies
Consortium Members

GEO
Entrez Gene
MGI- Gene Expression

The Mouse Transcriptome Project

The completion of the sequencing of the human genome in April 2003 represented an enormous scientific achievement and the start of a new era in biomedical research. However, it also brought into relief the reality that having the sequence is only the start in determining the function and therapeutic potential for all human genes. This is arguably a much more difficult and complex task, and technologies and datasets analogous to the sequence itself need to be created and made available to the research community to enable this task.

One of the most universally asked questions of novel genes used to acquire clues to their function is tissue localization, i.e., where each gene is expressed. While not sufficient in itself to establish function, such information is often invaluable in determining candidate functions for a given unknown gene, and candidate genes for a given physiological process. The gathering of such information (frequently by Northern or dot blot) is often the first step a laboratory scientist will take when confronted with a gene of unknown function. A searchable database of the tissue expression of every gene would be widely used by researchers, would decrease duplication, and would accelerate the transition of genome information from sequence to biological and disease-related function.

This project utilizes Massively Parallel Signature Sequencing (MPSS) to profile RNA populations from a large number of rigorously collected mouse tissues. MPSS utilizes microbeads to capture and quantify tags for mRNAs isolated from tissues or cells; identity of the mRNAs is determined by sequencing of cDNA fragments attached to the beads. The C57BL/6J mouse was chosen as the source for tissues since the mouse offers consistency of genetic background, premorbid state, and tissue acquisition, as well as detailed genome information (this is the same strain from which the mouse genomic sequence was derived). These are essential elements for the creation of a meaningful gene expression dataset.

In this project, major sources of expression level variability (inter-individual, dissection, and temporal (i.e., circadian) variability) were miminized. Inter-individual variability was miminized by use of a single inbred mouse strain from a single mouse room of a single supplier (The Jackson Laboratory), a single age (10 weeks), and pooling of at least 5 animals per tissue sample. Male and female tissues were profiled separately to reveal genes that are sexually dimorphic in their expression. Circadian variability was minimized by doing all dissections during the same 90 minute window each day. Dissection and RNA degradation-induced variability was minimized by having experienced and practiced dissectors isolate the tissue samples extremely rapidly, use of RNA preservation buffers, immediate homogenization of tissues, and use of pools of tissue from at least five animals per sample.

In the current dataset, 90 samples are available for analysis. For each tissue, 2 million MPSS tags were sequenced, and are reported as transcripts per million, or tpm. This digital reporting format is similar to that used for the sequencing of Serial Analysis of Gene Expression (SAGE) tags. The data are available in queriable form, mapped to the mouse genome, at http://sgbpub.lynxgen.com, or at http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo.