Probalign: multiple sequence alignment using partition function posterior probabilities

Probalign uses partition function posterior probability estimates to compute maximum expected accuracy multiple sequence alignments. It performs statistically significantly better than the leading alignment programs Probcons v1.1, MAFFT v5.851, and MUSCLE v3.6 on BAliBASE 3.0, HOMSTRAD, and OXBENCH benchmarks. Probalign improvements are largest on datasets containing N/C terminal extensions and on datasets with long and heterogeneous length sequences. On heteregeneous length datasets containing repeats Probalign alignment accuracy is 10% and 15% than the other three methods when standard deviation of length is at least 300 and 400.

New and fast Probalign v1.4 now available for download here.

Older versions of Probalign available here.

Please contact Usman Roshan (usman@njit.edu) for support.

Citation: U. Roshan and D. R. Livesay, Probalign: multiple sequence alignment using partition function posterior probabilities, Bioinformatics, 22(22):2715-21, 2006 (PDF)

Data used in the paper:

N/C extension simulated data. Includes all programs used for simulating the data as well as the simulated datasets.

BAliBASE 2.0 repeat alignments. True alignments in FASTA format, core regions in upper case, and ambiguous ones in lower case. qscore program can be used for evaluating alignment accuracy

BAliBASE 3.0, HOMSTRAD, and OXBENCH multiple sequence alignment benchmarks from the websites hosting the distributions.

Related: Probalign study for RNA-genome alignment here

Last updated May 2010