Microbiome Informatics: Deciphering Microscopic Life And Its Interactions In The Body And The World

Dr. Gail Rosen
Electrical and Computer Engineering Department, Drexel University


Abstract

DNA sequencing is outpacing Moore's law and is opening up fields of research unimaginable a decade ago. Until recently, researchers relied on isolation and culturing of individual organisms, which can only be applied to less than 10% of species. This has prevented researchers from understanding the coexistence, the interactions of species, and the complex ecology of the thousands of microbes present in the human body and in each gram of soil. With the advent of high-throughput technologies which enable sequencing of DNA from the massive amount of organisms in numerous environmental samples, we can now compare microbial DNA from ecosystems, such as soil, ocean, lakes, and especially the human body, to understand the earth's carbon/nitrogen cycles and the complex host-microbe interactions of disease. While traditional bioinformatics usually annotates full-length (sequenced-and-assembled) genomes, metagenomic algorithms must now be designed to analyze fragments of genomes and handle the majority of organisms that have never been encountered (e.g. ``unknown" to any database) but still require identification and annotation. In this talk, I will introduce a classification pipeline that will identify known organisms and label them using a supervised approach, and then group unknown/novel organisms using an unsupervised approach. My lab has benchmarked known/novel detection algorithms for whole-genome shotgun sequences, and we are the first to implement a confidence threshold for each assignment. Utilizing both taxonomic and functional assignments, I will show feature selection techniques that enhance comparison of samples and demonstrate promising results for correlating particular protein families with aging. Finally, I will discuss our ongoing collaborations with clinical and environmental science investigators.