Methods for accurate quantification of LTR-retrotransposon copy number using short-read sequence data: a case study in Sorghum

Mol Genet Genomics. 2016 Oct;291(5):1871-83. doi: 10.1007/s00438-016-1225-9. Epub 2016 Jun 13.

Abstract

Transposable elements (TEs) are ubiquitous in eukaryotic genomes and their mobility impacts genome structure and function in myriad ways. Because of their abundance, activity, and repetitive nature, the characterization and analysis of TEs remain challenging, particularly from short-read sequencing projects. To overcome this difficulty, we have developed a method that estimates TE copy number from short-read sequences. To test the accuracy of our method, we first performed an in silico analysis of the reference Sorghum bicolor genome, using both reference-based and de novo approaches. The resulting TE copy number estimates were strikingly similar to the annotated numbers. We then tested our method on real short-read data by estimating TE copy numbers in several accessions of S. bicolor and its close relative S. propinquum. Both methods effectively identify and rank similar TE families from highest to lowest abundance. We found that de novo characterization was effective at capturing qualitative variation, but underestimated the abundance of some TE families, specifically families of more ancient origin. Also, interspecific reference-based mapping of S. propinquum reads to the S. bicolor database failed to fully describe TE content in S. propinquum, indicative of recent TE activity leading to changes in the respective repetitive landscapes over very short evolutionary timescales. We conclude that reference-based analyses are best suited for within-species comparisons, while de novo approaches are more reliable for evolutionarily distant comparisons.

Keywords: Genome evolution; LTR-retrotransposons; Sorghum; Transposable elements.

MeSH terms

  • Computer Simulation
  • Gene Dosage*
  • Genetic Variation
  • Genome Size
  • Genome, Plant
  • Plant Leaves / genetics
  • Retroelements / genetics*
  • Sequence Analysis, DNA / methods*
  • Sorghum / genetics*

Substances

  • Retroelements