Introduction

mirSTP is a computational tool for predicting transcription start sites (TSS) of human intergenic miRNAs at high resolution using GRO/PRO-seq data.

mirSTP takes advantage of two features of GRO/PRO-seq data to perform the prediction: 1) Divergent sharp peaks around transcription start sites; 2) continuous signal over active transcription regions.

 
alternate text

Installation

1. Requirements: R and bedtools

Install R and bedtools (v2.24.0), and add /bin directory to your executable path.

2. Install mirSTP

unzip mirSTP.zip            #Unzip the file
cd mirSTP/                  #Change directories into the folder
chmod 755 bin/mirSTP        #Change the mode of executable files


#Add mirSTP scripts to Shell searching path ($PATH). This step is optional.
#If your mirSTP is installed at /home/usrname/mirSTP
export PATH=/home/usrname/mirSTP/bin/:$PATH

Input and output

mirSTP takes an alignment of GRO/PRO-seq data and reports the predicted miRNA TSSs at stringent, medium, and relax cutoff levels.

1. Input

mirSTP takes hg19 alignment files in bed or bam format as input. Multiple samples should be given as a space-separated list.

2. Output

mirSTP tab-delimited text files including predicted TSSs for active intergenic miRNAs.

**_mirSTP_stringent.txt: predicted TSSs at the stringent cutoff level
**_mirSTP_medium.txt: predicted TSSs at the medium cutoff level
**_mirSTP_relax.txt: predicted TSSs at the relax cutoff level

Each of the output file has following fields:

Field Description
miRNA miRNA name
Chr miRNA chromosome
TSS Genomic coordinate of the predicted TSS
Strand miRNA strand
Score_plus Log likelihood score to estimate the sharp peaks at the sense strand
Score_minus Log likelihood score to estimate the sharp peaks at the antisense strand
Pvalue_gb p-value to estimate the activity of gene body
Num_5k The minimum number of reads among the continuous 5kb window


Usage

Usage: mirSTP -i bed/bam files -t technique -f format -o outputname 
e.g: mirSTP -i K562_groseq.bam -t GRO -f bam -o K562
-i [bed|bam file(s)] required, hg19 read alignment files in bed (6 columns) or bam format, each file is separated by space
-t [GRO|PRO] the technique to generate the data. (default: GRO)
-f [bed|bam] alignment file format. (default: bam)
-o [string] required, prefix of output file name
-h help message


Reference

Accurate identification of microRNA transcription start sites from nascent RNA sequencing. Nucleic Acids Res. 2017 Jul 27;45(13):e121. doi: 10.1093/nar/gkx318. PMID: 28460090

Contacts