SDISCOVER: finding active motifs in a set of protein or DNA sequences

Jason Wang
Department of Computer Science
New Jersey Institute of Technology

Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University

Gung-Wei Chirn
Novartis Pharmaceuticals


We describe a method for discovering active (or frequently occurring) motifs in a set of protein or DNA sequences. SDISCOVERY takes a set of protein or DNA sequences and produces a collection of active motifs in the set. Another method SSORT is described that sorts the output from SDISCOVERY according to motifs' lengths and deletes all substring motifs having the same occurrence number as their superstring motifs.


The programs are written in C programming language. They run on a sun sparc workstation under the SUN operating system version 4.1.2.
In the links below, we have posted the source code of the software we developed and the steps to compile and run it.


    Input file format: FASTA format; see file SAMPLE.
    Note the following items concerning sequences.

    Below is an example of FASTA format input which is in the SAMPLE file.


      When running the sdiscovery program you will see the following lines at command prompt. To use all default parameter values, you can just press "enter" on the keyboard.

      A sample output is shown below (the data.out file) after using the input file SAMPLE.

      After running the sdiscovery and having the output in data.out, we sort the result in data.out file with ssort program.
      Using ssort data.out > sorted.output, we get the sorted output in sorted.output. Below is the sorted output.


      Jason T. L. Wang, Thomas G. Marr, Dennis Shasha, Bruce A. Shapiro and Gung-Wei Chirn, "Discovering Active Motifs in Sets of Related Protein Sequences and Using Them for Classification," Nucleic Acids Research, Vol. 22, No. 14, Aug. 1994, pp. 2769-2775.

      Download Issues

      Some browsers open the PDF file and the Web page manuals and programs instead of starting a download. If this happens, try right-clicking on the link and choosing an option named "Save Target As..." or similar. If a separate window is popped up, click "File" on the top bar menu of the window and click on "Save As" to save the file.