Data Mining in Bioinformatics: A Case Study on RNA Motif Discovery

Dr. Jason Wang, Professor of Computer Science
New Jersey Institute of Technology


Abstract

I will give an overview of data mining problems in bioinformatics, with a focus on RNA motif discovery. The ``central dogma'' of molecular biology refers to the basic flow of information from DNA to messenger RNA (mRNA) to protein. In this flow, a DNA sequence is transcribed to an mRNA sequence, which is then translated to a protein sequence. RNA motifs, often obtained from wet lab experiments conducted by bench biologists, are small segments on RNA sequences or RNA structures. These motifs have been shown to play various roles in post-transcriptional control including mRNA translation, mRNA stability, and gene regulation, among others. Computational bioinformatitions develop software tools to automatically detect RNA motifs in genomes. Their findings are then validated through wet lab experiments or biomedical literature, and sometimes may lead to new discoveries concerning the functions of RNA. In this talk, we present a toolkit, called RADAR (acronym for RNA Data Analysis and Research), for finding motifs in RNA structures. This toolkit employs an efficient dynamic programming algorithm for aligning two or more RNA secondary structures. The RADAR toolkit is being applied to several organisms including human, mouse, rat, Drosophila, virus, as well as trypanosome mRNAs, and preliminary results are promising. RADAR is part of a long-term project aiming to build a cyber infrastructure for RNA data mining and data integration. This cyber infrastructure enables access, retrieval, comparison, analysis, and discovery of biologically significant RNA motifs through the Internet as well as the integration of these motifs with online biomedical ontologies. The presentation will conclude by pointing out some directions of future work and research challenges.