NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM4943519 Query DataSets for GSM4943519
Status Public on Sep 21, 2021
Title rrmm_lv1
Sample type SRA
 
Source name Liver
Organism Mus musculus
Characteristics tissue: Liver
genotype: wild type
age: 9 weeks
treatment: RNase R-treated
Extracted molecule total RNA
Extraction protocol Tissue was dissected at indicated ages (see section ”Samples” in GEO metadata). RNA for mouse, rat, rhesus and human was extracted using the RNeasy protocol from QIAGEN; for opossum, a TRIzol/chloroform-based protocol was used. Tissues were stored at -80 C until library preparation.
We used 5µg of RNA per tissue as starting material for library preparation, which were treated with RNase R for 1 h at 37 ̊C to degrade linear RNAs, followed by RNA purification with RNA Clean & Concentrator-5 kit (Zymo Research) according to the manufacturer’s protocol. Libraries were prepared from the purified RNA with the Illumina TruSeq Stranded Total RNA with Ribo-Zero Gold according to the protocol with the following modifications to select larger fragments: 1). Instead of the recommended 8 min at 68 ̊C for fragmentation, we incubated samples for only 4 min at 68 ̊C to increase the fragment size; 2). In the final PCR clean-up after enrichment of the DNA fragments, we changed the 1:1 ratio of DNA to AMPure XP Beads to a 0.7:1 ratio to select for binding of larger fragments. Libraries were analyzed on the fragment analyzer for their quality and sequenced with the Illumina HiSeq 2500 platform.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2500
 
Description mmCircRNAs.gtf
mmCircRNA_cpms.txt
Data processing Read quality assessment and mapping: Basecalling was performed with the Illumina Casava software (v 1.9) and read quality assessed with FastQC (v 0.10.1). The ensembl annotations for opossum (monDom5), mouse (mm10), rat (rn5), rhesus macaque (rheMac2) and human (hg38) were used to build transcriptome indexes for mapping with TopHat2 (v 2.0.11). TopHat2 was run with default settings and the --mate-inner-dist and --mate-std-dev options set to 50 and 200 respectively. The mate-inner-distance parameter was estimated based on the fragment analyzer report.
CircRNA annotation: We developed a custom pipeline to detect circRNAs, which performs the following steps: Unmapped reads with a phred quality value of at least 25 are used to generate 20 bp anchor pairs from the terminal 3' and 5'-ends of the read. Anchors are remapped with bowtie2 (v 2.1.0) on the reference genome. Mapped anchor pairs are filtered for 1) being on the same chromosome, 2) being on the same strand and 3) for having a genomic mapping distance to each other of a maximum of 100 kb. Next, anchors are extended upstream and downstream of their mapping locus. They are kept if pairs are extendable to the full read length. During this procedure a maximum of two mismatches is allowed. Next, all unpaired reads are selected from the accepted_hits.bam file generated by TopHat2 (singletons) and assessed for whether the mate read (second read of a paired-end sequencing read) of the anchor pair mapped between the backsplice coordinates. All anchor pairs for which 1) the mate did not map between the genomic backsplice coordinates, 2) the mate mapped to another backsplice junction or 3) the extension procedure could not reveal a clear breakpoint are removed. Based on the remaining candidates, a backsplice index is built with bowtie2 and all reads are remapped on this index to increase the read coverage by detecting reads that cover the BSJ with less than 20 bp, but at least 8 bp. Candidate reads that were used to build the backsplice index and now mapped to another backsplice junction are removed. Upon this procedure, the pipeline provides a first list of backsplice junctions. The set of scripts, which performs the identification of putative BSJs, as well as a short description of how to run the pipeline are reposited on GitHub (https://github.com/Frenzchen/ncSplice_circRNAdetection).
CircRNA expression levels: CPM (counts per million) values for backsplice junctions were calculated for each tissue as follows: counts = mean(counts_rep1, counts_rep2, counts_rep3); totalMappedReads =mean(mappedReads_rep1, mappedReads_rep2, mappedReads_rep3); CPM= counts*10^6/totalMappedReads To distinguish putative backsplice junctions from the technical and biological noise background, the enrichment of the previously defined junctions in RNase R treated samples was calculated. The enrichment was defined as CPM increase in RNase R treated versus untreated samples: enrichment = CPM_RNase R/CPM_untreated Candidates with a log2-enrichment of smaller 1.5 were removed. Expression levels for all circRNAs can be found in the files mdCircRNA_cpms.txt (opossum), mmCircRNA_cpms.txt (mouse), rnCircRNA_cpms.txt (rat), rmCircRNA_cpms.txt (rhesus macaque) and hsCircRNA_cpms.txt (human).
CircRNA transcript reconstruction: To reconstruct the exon structure of circRNA transcripts in each tissue, we made use of the junction enrichment in RNase R treated samples. To normalize junction reads across libraries, the size factors ( size factor = median(x/geometric mean) ) based on the geometric mean ( geometric mean = product(x)^(1/length(x)) ) of common junctions in untreated and treated samples was calculat, with x being a vector containing the number of reads per junction. We then compared read coverage for junctions outside and inside the BSJ for each gene and used the log2-change of junctions outside the backsplice junction to construct the expected background distribution of change in junction coverage upon RNase R treatment. The observed coverage change of junctions inside the backsplice was then compared to the expected change in the background distribution and junctions with a log2-change outside the 90% confidence interval were assigned as circRNA junction. A loose cut-off was chosen here, because involved junctions can show a decrease in coverage if their linear isoform was present at high levels before (degradation levels of linear isoforms do not correlate with the enrichment levels of circRNAs). Next, we reconstructed a splicing graph for each circRNA candidate, in which network nodes are exons connected by splice junctions (edges). Connections between nodes are weighted by the coverage in the RNase R treated samples. The resulting network graph is directed (because of the known circRNA start and stop coordinates), acyclic (because splicing always proceeds in one direction), weighted and relatively small. We used a simple breadth-first-search algorithm to traverse the graph and to define the strength for each possible isoform by its mean coverage. For further analyses, only the strongest isoform was considered. A gtf-file with the circRNA annotation for each species has been deposited (mdCircRNAs.gtf, mmCircRNA.gtf, rnCircRNAs.gtf, rmCircRNAs.gtf, hsCircRNAs.gtf).
Genome_build: opossum: monDom5 (ensembl release 75, feb 2014); mouse: mm10 (ensembl release 75, feb 2014); rat: rn5 (ensembl release 75, feb 2014); rhesus macaque: rheMac2 (ensembl release 77, oct 2014); human: hg38 (ensembl release 77, oct 2014)
Supplementary_files_format_and_content: tab-delimited text files include CPM values for all circRNAs.
Supplementary_files_format_and_content: gtf-files contain the circRNA annotation (genes, transcripts and exons) for each species.
 
Submission date Nov 25, 2020
Last update date Sep 21, 2021
Contact name Franziska Gruhl
Organization name SIB Swiss Institute of Bioinformatics
Street address University of Lausanne, Quartier Sorge
City Lausanne
ZIP/Postal code 1015
Country Switzerland
 
Platform ID GPL17021
Series (1)
GSE162152 Identification and evolutionary comparison of circular RNAs in five mammalian species and three organs.
Relations
BioSample SAMN16912860
SRA SRX9582592

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap