Data Availability


Processed Data

All published data generated via the PARIS or SHARC methods is available in BAM format. File prefixes are annotations of details corresponding to the experimental conditions as described in the table below.

Name pri_crssant prigap1 prigap1_filtered prigapm prigapm_filtered prihomo pritrans gaplen seglen fastqc

Species
HS - Homo Sapiens
MM - Mus Musculus
VIR - Virus
IV - In vitro
Cell Type // Tissue // Strain
Verbatim
Crosslinker
AMT - Aminomethyltrioxsalen
Amo - Amotosalen
DPI - 2,6-Pyridinedicarboxylic acid (SHARC)
FA - Formaldehyde
Crosslinker concentrations are described in either mg/mL or molar (M).
A rule of thumb is that the psoralens (AMT/Amo) are prepared in mg/mL, other crosslinkers are in M.
Proximity Ligase
T4 - T4 RNA Ligase 1
Mth - Mth RNA Ligase
Ligation Incubation Time
1, 2, 12, or 24 hours of incubation.
Exonuclease Trimming Time
0, 1, 2, 12, or 24 hours of incubation.
Samples labeled exo-0h are considered "non-trimmed".
File Suffixes (Types)
pri_crssant - CRSSANT precursor file; prigap1_filtered + pritrans with "SA" tag column 21 removed.
prigap1 - Non-continuous alignments, each with 1 gap.
prigapm - Non-continuous alignments, each with more than 1 gap.
"_filtered" - Reads filtered for short gaps (1-2nt) and splice junctions.
prihomo - Non-continuous alignments, where 2 arms overlap each other.
pritrans - Non-continuous alignments, with 2 arms on different strands or chromosomes.
gaplen - Distribution of gap lengths among the alignments.
seglen - Distribution of arm lengths among the alignments.

Reference Genomes

The reference genomes used for alignment and annotation are provided below for convenience.

Normal Reference

Curated Reference

Special Reference

Normal
Default version of a reference genome maintained by the Genome Reference Consortium (GRC). These genomes are used as a baseline for most genomic analyses such as gene annotation, variant calling, and comparative genomics. They represent a typical species sequence, useful in general-purpose applications.
Curated
These genomes have had specific repeated and abundant RNA regions masked with "N" and added as separate "chromosomes". Curated genomes are particularly useful in studies focusing on structural variants, offering more accurate results in these areas. These references may also undergo manual review for better representation of complex genomic regions.
Special
Special reference genomes are created by concatenating two or more RNA genes known to have interactions, focusing on specific biological processes.

Raw Data

Option 1. Bulk download data from a whole project via NCBI GEO accesion list.

    Use script sra_grab.sh for rapid download and extraction of data via a GEO accession page.

  • At the corresponding GEO Accession Display, select SRA Run Selector.
  • Download the "Accession List", which should contain SRRXXXXX lines that correspond to the correct data.
  • Modify the shell script sra_grab.sh (lines 20, 21) for input file path (SRR_LIST_FILE="SRR_Acc_List.txt") and output directory path (OUTPUT_DIR="./geo_rnaseq_data") if appropriate.


Option 2. Individual sample (direct) download through NCBI GEO linked files.

  • At the bottom of GEO Accession Display, select SRA Run Selector.
  • In the run table, select the SRRXXXXX file in the Run column that corresponds to the correct data.
  • Under the FASTA/FASTQ download tab, click to download directly.
  • To download to a server, copy the link that opens in your browser upon clicking the FASTQ link; use wget or curl.

WARNING: Always check to see if a study uses paired-end or single-end sequences. If downloading through Option 2, be sure to download both files of the pair.

●●●●●●●