PeakID: fast elastic peak detection for mass spectrometry data mining

Xin Zhang
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University

Dennis Shasha
Department of Computer Science
Courant Institute of Mathematical Sciences
New York University

Yang Song
Department of Computer Science
New Jersey Institute of Technology

Jason Wang
Department of Computer Science
New Jersey Institute of Technology
wangj@njit.edu



Introduction

We present here a software tool, called PeakID, for fast elastic peak detection in 2D liquid chromatographic-mass spectrometry (LC-MS) data. PeakID takes 2D LC-MS data as input and locates all peaks across multiple window sizes of interest in the input data.



Installation

The programs of PeakID are written in C++. They were compiled and tested on a Dell PC under the Microsoft Windows XP operating system.
Here, we post the source code of the software and provide instructions to compile and run the programs.



Input

All the input files are text files. In the training phase, we use a state-space algorithm to find the topology and structure of an efficient Shifted Aggregation Tree to be used by PeakID. To train the software tool, you need to provide two input files: a sample data file (sample.txt) and a threshold file (thresh.txt). In addition, you need to provide the name of a file (tree.txt), in which the topology and structure, i.e., the shift, shadow size and degree of each level, of the efficient Shifted Aggregation Tree will be computed and stored.

To detect peaks, you need to provide three input files: an input data file (input.txt), a threshold file (thresh.txt), and the Shifted Aggregation Tree structure file (tree.txt). The sample data file (sample.txt) and the input data file (input.txt) have exactly the same format.

For your convenience, we have included a copy of each of the above text files in this package.



Output

The software tool displays the output on the terminal. You can redirect the output to a file. The output comprises a list of tuples as follows:
     Starting_position, Window_size, Sum_of_intensity_values
Each tuple represents that there is a peak in the time window beginning at Starting_position with Window_size time points and the sum of the intensity values occurring within this time window equals Sum_of_intensity_values. The Sum_of_intensity_values must be greater than or equal to the threshold associated with the Window_size.



Usage

To run SAT.exe, you must be in the DOS environment.



Download Issues

Some browsers open the text file and the Web page manuals and programs instead of starting a download. If this happens, try right-clicking on the link and choosing an option named "Save Target As..." or similar. If a separate window is popped up, click "File" on the top bar menu of the window and click on "Save As" to save the file.