|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Sep 21, 2021 |
Title |
rrmm_lv1 |
Sample type |
SRA |
|
|
Source name |
Liver
|
Organism |
Mus musculus |
Characteristics |
tissue: Liver genotype: wild type age: 9 weeks treatment: RNase R-treated
|
Extracted molecule |
total RNA |
Extraction protocol |
Tissue was dissected at indicated ages (see section ”Samples” in GEO metadata). RNA for mouse, rat, rhesus and human was extracted using the RNeasy protocol from QIAGEN; for opossum, a TRIzol/chloroform-based protocol was used. Tissues were stored at -80 C until library preparation. We used 5µg of RNA per tissue as starting material for library preparation, which were treated with RNase R for 1 h at 37 ̊C to degrade linear RNAs, followed by RNA purification with RNA Clean & Concentrator-5 kit (Zymo Research) according to the manufacturer’s protocol. Libraries were prepared from the purified RNA with the Illumina TruSeq Stranded Total RNA with Ribo-Zero Gold according to the protocol with the following modifications to select larger fragments: 1). Instead of the recommended 8 min at 68 ̊C for fragmentation, we incubated samples for only 4 min at 68 ̊C to increase the fragment size; 2). In the final PCR clean-up after enrichment of the DNA fragments, we changed the 1:1 ratio of DNA to AMPure XP Beads to a 0.7:1 ratio to select for binding of larger fragments. Libraries were analyzed on the fragment analyzer for their quality and sequenced with the Illumina HiSeq 2500 platform.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
mmCircRNAs.gtf mmCircRNA_cpms.txt
|
Data processing |
Read quality assessment and mapping: Basecalling was performed with the Illumina Casava software (v 1.9) and read quality assessed with FastQC (v 0.10.1). The ensembl annotations for opossum (monDom5), mouse (mm10), rat (rn5), rhesus macaque (rheMac2) and human (hg38) were used to build transcriptome indexes for mapping with TopHat2 (v 2.0.11). TopHat2 was run with default settings and the --mate-inner-dist and --mate-std-dev options set to 50 and 200 respectively. The mate-inner-distance parameter was estimated based on the fragment analyzer report. CircRNA annotation: We developed a custom pipeline to detect circRNAs, which performs the following steps: Unmapped reads with a phred quality value of at least 25 are used to generate 20 bp anchor pairs from the terminal 3' and 5'-ends of the read. Anchors are remapped with bowtie2 (v 2.1.0) on the reference genome. Mapped anchor pairs are filtered for 1) being on the same chromosome, 2) being on the same strand and 3) for having a genomic mapping distance to each other of a maximum of 100 kb. Next, anchors are extended upstream and downstream of their mapping locus. They are kept if pairs are extendable to the full read length. During this procedure a maximum of two mismatches is allowed. Next, all unpaired reads are selected from the accepted_hits.bam file generated by TopHat2 (singletons) and assessed for whether the mate read (second read of a paired-end sequencing read) of the anchor pair mapped between the backsplice coordinates. All anchor pairs for which 1) the mate did not map between the genomic backsplice coordinates, 2) the mate mapped to another backsplice junction or 3) the extension procedure could not reveal a clear breakpoint are removed. Based on the remaining candidates, a backsplice index is built with bowtie2 and all reads are remapped on this index to increase the read coverage by detecting reads that cover the BSJ with less than 20 bp, but at least 8 bp. Candidate reads that were used to build the backsplice index and now mapped to another backsplice junction are removed. Upon this procedure, the pipeline provides a first list of backsplice junctions. The set of scripts, which performs the identification of putative BSJs, as well as a short description of how to run the pipeline are reposited on GitHub (https://github.com/Frenzchen/ncSplice_circRNAdetection). CircRNA expression levels: CPM (counts per million) values for backsplice junctions were calculated for each tissue as follows: counts = mean(counts_rep1, counts_rep2, counts_rep3); totalMappedReads =mean(mappedReads_rep1, mappedReads_rep2, mappedReads_rep3); CPM= counts*10^6/totalMappedReads To distinguish putative backsplice junctions from the technical and biological noise background, the enrichment of the previously defined junctions in RNase R treated samples was calculated. The enrichment was defined as CPM increase in RNase R treated versus untreated samples: enrichment = CPM_RNase R/CPM_untreated Candidates with a log2-enrichment of smaller 1.5 were removed. Expression levels for all circRNAs can be found in the files mdCircRNA_cpms.txt (opossum), mmCircRNA_cpms.txt (mouse), rnCircRNA_cpms.txt (rat), rmCircRNA_cpms.txt (rhesus macaque) and hsCircRNA_cpms.txt (human). CircRNA transcript reconstruction: To reconstruct the exon structure of circRNA transcripts in each tissue, we made use of the junction enrichment in RNase R treated samples. To normalize junction reads across libraries, the size factors ( size factor = median(x/geometric mean) ) based on the geometric mean ( geometric mean = product(x)^(1/length(x)) ) of common junctions in untreated and treated samples was calculat, with x being a vector containing the number of reads per junction. We then compared read coverage for junctions outside and inside the BSJ for each gene and used the log2-change of junctions outside the backsplice junction to construct the expected background distribution of change in junction coverage upon RNase R treatment. The observed coverage change of junctions inside the backsplice was then compared to the expected change in the background distribution and junctions with a log2-change outside the 90% confidence interval were assigned as circRNA junction. A loose cut-off was chosen here, because involved junctions can show a decrease in coverage if their linear isoform was present at high levels before (degradation levels of linear isoforms do not correlate with the enrichment levels of circRNAs). Next, we reconstructed a splicing graph for each circRNA candidate, in which network nodes are exons connected by splice junctions (edges). Connections between nodes are weighted by the coverage in the RNase R treated samples. The resulting network graph is directed (because of the known circRNA start and stop coordinates), acyclic (because splicing always proceeds in one direction), weighted and relatively small. We used a simple breadth-first-search algorithm to traverse the graph and to define the strength for each possible isoform by its mean coverage. For further analyses, only the strongest isoform was considered. A gtf-file with the circRNA annotation for each species has been deposited (mdCircRNAs.gtf, mmCircRNA.gtf, rnCircRNAs.gtf, rmCircRNAs.gtf, hsCircRNAs.gtf). Genome_build: opossum: monDom5 (ensembl release 75, feb 2014); mouse: mm10 (ensembl release 75, feb 2014); rat: rn5 (ensembl release 75, feb 2014); rhesus macaque: rheMac2 (ensembl release 77, oct 2014); human: hg38 (ensembl release 77, oct 2014) Supplementary_files_format_and_content: tab-delimited text files include CPM values for all circRNAs. Supplementary_files_format_and_content: gtf-files contain the circRNA annotation (genes, transcripts and exons) for each species.
|
|
|
Submission date |
Nov 25, 2020 |
Last update date |
Sep 21, 2021 |
Contact name |
Franziska Gruhl |
Organization name |
SIB Swiss Institute of Bioinformatics
|
Street address |
University of Lausanne, Quartier Sorge
|
City |
Lausanne |
ZIP/Postal code |
1015 |
Country |
Switzerland |
|
|
Platform ID |
GPL17021 |
Series (1) |
GSE162152 |
Identification and evolutionary comparison of circular RNAs in five mammalian species and three organs. |
|
Relations |
BioSample |
SAMN16912860 |
SRA |
SRX9582592 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|