Comparative sequence analysis of a gene-rich cluster at human chromosome 12p13 and its syntenic region in mouse chromosome 6

Genome Res. 1998 Jan;8(1):29-40.

Abstract

The Human Genome Project has created a formidable challenge: the extraction of biological information from extensive amounts of raw sequence. With the increasing availability of genomic sequence from other species, one approach to extracting coding and regulatory element information is through cross-species sequence comparison. To assess the strengths and weaknesses of this methodology for large-scale sequence analysis, 227 kb of mouse sequence syntenic to a gene-rich cluster on human chromosome 12p13 was obtained. Primarily through percent identity plots (PIPs) of SIM comparative sequence alignments, the sequence of coding regions, putative alternative exons, conserved noncoding regions, and correlation in repetitive element insertions were easily determined. The analysis demonstrated that the number, order, and orientation of all 17 genes are conserved between the two species, whereas two human pseudogenes are absent in mouse. In addition, apart from MIRs, no direct correlation of distribution or position of the majority of repetitive elements between the two species is seen. Finally, in examining the synonymous and nonsynonymous substitution rates in the conserved genes, a large variation in nonsynonymous rates is observed indicating that the genes in this region are diverging at different rates. This study indicates the utility and strength of large-scale cross-species sequence comparisons in the extraction of biological information from raw sequence, especially when combined with other computational tools such as GRAIL and BLAST.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence / genetics
  • Animals
  • Chromosome Mapping
  • Chromosomes / genetics*
  • Chromosomes, Human, Pair 12 / genetics*
  • Conserved Sequence
  • Humans
  • Mice
  • Molecular Sequence Data
  • Multigene Family*
  • Repetitive Sequences, Nucleic Acid
  • Sequence Alignment
  • Sequence Analysis, DNA

Associated data

  • GENBANK/AC002397