Feature Engineering for Large Scale Predictive Modeling with Electronic Health Records

Dr. Fei Wang, Researcher


Predictive modeling lies in the heart of many medical informatics problems, such as early detection of some chronic diseases and patient hospitalization/readmission prediction. Typically those predictive models are built upon patient Electronic Health Records (EHR), which are systematic collection of patient information including demographics, diagnosis, medication, lab tests, etc. We refer those information as patient features. High quality features are of vital importance to building successful predictive models. In this talk, I will present two feature engineering technologies to improve the quality of the raw features extracted from original patient EHRs: (1) feature augmentation, which constructs more effective derived features from existing raw features by exploring the event sequentiality; (2) feature densification, which imputes the missing feature values via knowledge transfer across similar patients. Along with each technique we also developed a visual interface to facilitate the user exploring the derived features. Finally I will introduce a parallel predictive modeling platform we built for efficient training and testing large scale predictive models.