Concurrent Bioinformatics Software for Discovering Genome-wide Patterns

Lonnie R. Welch
Ohio University


Abstract

An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements. Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. The speaker will present WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into the Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker is deployed on the Glenn cluster at the Ohio Supercomputer Center. WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.