Introduction:

DiscoverR is a software tool for discovering structural repeats in one RNA or common patterns from two RNA secondary structures without pseudoknots. The program is written in Java.

Installation:

1. Create a folder named DiscoverR.

2. Download the jar file below into the DiscoverR folder.

       DiscoverR.jar

       DiscoverR.zip (source code)

3. Put the data files in the DiscoverR folder.

4. Run the tool.

Usage:

       -p <p|r>
       Choose a program from:
              p: pair-wise comparison (firstRNA.struct secondRNA.struct)
              r: find the patterns repeated in one RNA (rna.struct)
       -d <dissimilarity threshold> (Default): 0.1
       -u <output> file to receive output (Default): result.txt
       -m <minimum number of base pairs in the output structure> (Default): 5
 

Examples:

java -jar DiscoverR.jar -p p firstRNA.struct secondRNA.struct -d 0.1 -m 5

The above command finds the common patterns of the query RNA secondary structure stored in the input file named firstRNA.struct and the subject RNA secondary structure stored in the input file named secondRNA.struct. Both files should be in the DiscoverR folder.

-d specifies the dissimilarity threshold value used by the program; allowed threshold values range from 0 to 0.2 inclusively; default is 0.1.

In the input file firstRNA.struct (secondRNA.struct, respectively), the query (subject, respectively) RNA secondary structure is represented using the Vienna style dot-parenthesis notation. Each input file has three lines: header, primary sequence and secondary structure. The length of each input sequence is at most 300 nt. See a sample input file firstRNA.struct and a sample input file secondRNA.struct. Download the sample input files and use Notepad to open and see these files.

The output file contains the common patterns found in the query RNA secondary structure stored in the file firstRNA.struct and the subject RNA secondary structure stored in the file secondRNA.struct. Each pattern structure is represented using the Vienna style dot-parenthesis notation. The beginning position and the ending position of each contiguous subsequence in each pattern are printed out. See a sample output file, result. Click here to see the graphical display of the common patterns found in the sample RNA secondary structures portrayed using RNAViz. Here you can see another collection of input files 1-firstRNA.struct, 1-secondRNA.struct, and output file 1-result, graphical display 1-figures.

java -jar DiscoverR.jar -p r rna.struct -d 0.1 -m 5

The above command finds the structural repeats in the RNA secondary structure stored in the input file named rna.struct, which should be in the DiscoverR folder. Download the sample input file rna.struct. The output file contains the structural repeats found in the sample RNA secondary structure. See the sample output file, result. Click here to see the graphical display of a structural repeat.

 

For any suggestions, comments or queries about this website, please contact jason.t.wang@njit.edu.