DMS Statistics Seminar Series

Statistics Seminar Series

Department of Mathematical Sciences
and
Center for Applied Mathematics and Statistics

New Jersey Institute of Technology

SPRING 2012

All seminars are 4:00 - 5:00 p.m., in Cullimore Hall Room 611 (Math Conference Room) unless noted otherwise. Refreshments are usually served at 3:30 p.m., and talks start at 4:00 p.m. If you have any questions about a particular seminar, please contact the person hosting the speaker.

Date	Speaker and Title	Host
Thursday January 19, 2012 4:00PM	Michael Ehrlich, Ph.D., Department of Finance, New Jersey Institute of Technology Opportunities in Applied Finance (abstract )	Sunil Dhar
Thursday February 16, 2012 4:00PM	Amy Davidow, Associate Professor, Department of Preventive Medicine & Community Health, University of Medicine and Dentistry of New Jersey, Newark, NJ Diagnostic accuracy studies and spectrum bias (abstract)	Wenge Guo
Wednesday February 22, 2012 11:30AM	Ji Meng Loh, PhD., AT&T Labs - Research, Florham Park, NJ Accounting for Spatial Correlation in the Scan Statistic (abstract)	Aridaman Jain
Thursday February 23, 2012 4:00PM	Huaihou Chen, Ph.D., Department of Biostatistics, Columbia University Flexible models and methods for longitudinal and multilevel functional data (abstract)	Sunil Dhar
Wednesday February 29, 2012 10:30AM	Gavino Puggioni, Ph.D., Real Lab, Biology Department, Emory University Bayesian Hierarchical Models with Dynamic Structures and Latent Variables: Methodology and Applications. (abstract)	Sundar Subramanian
Thursday March 1, 2012 4:00PM	Weili He, Ph.D., Director, Biostatistics and Research Decision Sciences, Merck Sharp & Dohme Corporation, Rahway NJ A framework for joint modeling and joint assessment of efficacy and safety endpoints for probability of success evaluation and optimal dose selection (abstract)	Sunil Dhar
Thursday March 8, 2012 Cullimore Lecture Hall I 4:00PM	Kai Zhang, Department of Statistics, The Wharton School, University of Pennsylvania Valid Post-Selection Inference (abstract)	Wenge Guo
Tuesday March 20, 2012 1:00PM	*Li He,* Department of Statistics, Temple University Optimal Multiple Testing Procedure Incorporating Signal Strength (abstract)	Sunil Dhar
Thursday March 22, 2012 Cullimore 111 4:00PM	Xiaohong Huang, Ph.D., Principal Statistician, Biostatistics & Programming Sanofi, Bridgewater, NJ Estimation of treatment effect for the sequential parallel design (abstract)	Sunil Dhar
Thursday March 29, 2012 4:00PM	Brian L. Egleston, Ph.D., Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA The impact of misclassification due to survey response fatigue on estimation and identifiability of treatment effects (abstract)	Sunil Dhar
Thursday April 12, 2012 Cullimore 111 4:00PM	Michael Tortorella, Professor, Industrial and Systems Engineering Department, Rutgers University Some Explorations of the Kaplan-Meyer Estimator in the Presence of Immunes (abstract)	Aridaman Jain

ABSTRACTS

Opportunities in Applied Finance:

One of the hottest areas in business research is applied finance. There are many academics and practitioners interested in understanding how and why money moves as it does. Financial managers and hedge funds pay top dollar for analysts that can help them to gain a competitive advantage. There has also been an explosion of data and opportunities for new analysis. We will discuss some current topics and opportunities for collaboration.

Michael Ehrlich, Ph.D., Department of Finance, New Jersey Institute of Technology ~ January 19, 2012

Diagnostic accuracy studies and spectrum bias:

The meta-analysis of diagnostic accuracy studies has assumed that random variation can explain the variability in sensitivity and specificity. Although sensitivity and specificity are defined with respect to patient characteristics, empirical studies have noted variability associated to disease prevalence. Using elementary probability theory, we present a model of specificity for a three stage disease process, and show how specificity may be influenced by the distribution of stage-specific disease entities.

Amy Davidow, Associate Professor, Department of Preventive Medicine & Community Health, University of Medicine and Dentistry of New Jersey, Newark ~ February 16, 2012

Accounting for Spatial Correlation in the Scan Statistic:

The spatial scan statistic has been widely used in epidemiology as a tool to identify hotspots or clusters in the occurrence of diseases. An underlying assumption of the procedure is the independence of disease incidence between different locations. However, it is sometimes more realistic (e.g. for contagious diseases) to have correlation in the disease counts of neighboring regions. Using a simulation study and a data example, we show that when spatial correlation or overdispersion is present, the spatial scan statistic identifies significant disease clusters too frequently. We relate this issue of excessive false positives to similar work in Efron (2004; 2007). Finally, we introduce a simple procedure to obtain modified p values of identified clusters that account for observed correlation in the data. We show how this procedure helps to reduce the number of false positives, and how it changes the results of our data analyses.

Ji Meng Loh, PhD., AT&T Labs - Research, Florham Park, NJ ~ February 22, 2012

Flexible models and methods for longitudinal and multilevel functional data:

In the first work, we propose penalized spline-based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional random subject-specific curves and residual measurement error processes. Using penalized splines, we propose nonparametric estimation of the population mean function, varying-coefficient, random subject-specific deviations and the associated covariance function which represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Decomposing variability of the outcomes as a between-subject and a within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. Furthermore, we study the asymptotic behavior of the baseline P-spline estimator. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights.

This second work is motivated by a clinical study where patients undergo multiple 4-hour treatment cycles and within each treatment cycle, repeated measurements of subjects' vital signs are recorded. This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated categorical outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a new variance estimator robust to misspecification of correlation structure. Finally, we apply the methods to the SAH study to evaluate a recent debate on discontinuing the use of Nimodipine in the clinical community.

Huaihou Chen, Ph.D., Department of Biostatistics, Mailman School of Public Health, Columbia University, NY 10032 ~ February 23, 2012

Bayesian Hierarchical Models with Dynamic Structures and Latent Variables: Methodology and Applications:

Mathematical and statistical modeling offer different insights to understand environmental and biological systems. Often it is desirable to combine the two approaches in a unified model. A statistical model with a rich mathematical structure offers great advantages in terms of flexibility and parameter interpretation, but the estimation process can be rather challenging. Bayesian hierarchical models are in many instances an ideal framework to overcome some of these challenges. We propose some methodological innovations that involve the use of stochastic differential equations for spatio temporal data. Examples with both synthetic and real data are provided. Particular focus will be given to applications to the Sugar Cane Yellow Leaf virus spread and to wireless sensor data. Both examples will also illustrate other challenges that are often encountered in environmental data analysis: identification, missing observations, and data collection design. The last part of the talk is dedicated to show other ongoing research projects.

Gavino Puggioni, Ph.D., Real Lab, Biology Department, Emory University, Atlanta, GA 30033~ February 29, 2012

A framework for joint modeling and joint assessment of efficacy and safety endpoints for
probability of success evaluation and optimal dose selection:

Keywords: joint modeling; joint evaluation; Bayesian method; probability of success; MCMC; clinical utility index

The evaluation of clinical proof of concept, optimal dose selection, and phase III probability of success has traditionally been conducted by a subjective and qualitative assessment of the efficacy and safety data. This, in part, was responsible for the numerous failed phase III programs in the past. The need to utilize more quantitative approaches to assess efficacy and safety profiles has never been greater. In this paper, we propose a framework that incorporates efficacy and safety data simultaneously for the joint evaluation of clinical proof of concept, optimal dose selection, and phase III probability of success. Simulation studies were conducted to evaluate the properties of our proposed methods. The proposed approach was applied to two real clinical studies. On the basis of the true outcome of the two clinical studies, the assessment based on our proposed approach suggested a reasonable path forward for both clinical programs.

Weili He, Ph.D., Director, Biostatistics and Research Decision Sciences, Merck Sharp & Dohme Corporation, Rahway, NJ 08889 ~ March 1, 2012

Valid Post-Selection Inference:

It is common practice in statistical data analysis to perform data-driven model selection and derive statistical inference from the selected model as if this model was known in advance. Such inference is generally invalid. We propose to produce valid “post-selection inference” by reducing the problem to one of simultaneous inference. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity insurance” for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always more precise than full Scheffe protection. We describe the structure of the simultaneous inference problem and give some asymptotic results. We also develop an algorithm for numerical computation for the width of our new confidence intervals.

Kai Zhang, Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104 ~ March 8, 2012

Optimal Multiple Testing Procedure Incorporating Signal Strength:

This research focuses on incorporating signal strength into a multiple testing procedure using a decision theoretic approach. Specifically, we propose a general loss function which incorporates the severity of Type II errors. We also propose error rates that incorporate signal strength: the weighted marginal false discovery rate and the weighted marginal false nondiscovery rate. We then derive optimal procedure which minimizes the weighted marginal false nondiscovery rate, while controlling the weighted marginal false discovery rate. The numerical studies show that, by incorporating signal strength, much power can be gained.

Dr. Li He, Department of Statistics, Temple University, Philadelphia, PA, 19122 ~ March 20, 2012

Estimation of treatment effect for the sequential parallel design:

The sequential parallel clinical trial is a novel clinical trial design being used in psychiatric diseases that are known to have potentially high placebo response rates. The design consists of an initial parallel trial of placebo versus drug augmented by a second parallel trial of placebo versus drug in the placebo non-responders from the initial trial. Statistical research on the design has focused on hypothesis tests. However, an equally important output from any clinical trial is the estimate of treatment effect and variability around that estimate. In the sequential parallel trial, the most important treatment effect is the effect in the overall population. This effect can be estimated by considering only the first phase of the trial, but this ignores useful information from the second phase of the trial. We develop estimates of treatment effect that incorporate data from both phases of the trial. Our simulations and a real data example suggest that there can be substantial gains in precision by incorporating data from both phases. The potential gains appear to be greatest in moderate-sized trials, which would typically be the case in phase II trials.

Xiaohong Huang, Ph.D., Principal Statistician, Biostatistics & Programming Sanofi, Bridgewater, NJ 08807 ~ March 22, 2012

The impact of misclassification due to survey response fatigue on estimation and identifiability of treatment effects:

Response fatigue can cause measurement error and misclassification problems in survey research. Questions asked later in a long survey are often prone to more measurement error or misclassification. The response given is a function of both the true response and participant response fatigue. We investigate the identifiability of survey order effects and their impact on estimators of treatment effects. The focus is on fatigue that affects a given answer to a question rather than fatigue that causes non-response and missing data. We consider linear, Gamma, and logistic models of response that incorporate both the true underlying response and the effect of question order. For continuous data, survey order effects have no impact on study power under a Gamma model. However, under a linear model that allows for convergence of responses to a common mean, the impact of fatigue on power will depend on how fatigue affects both the rate of mean convergence and the variance of responses. For binary data and for less than a 50% chance of a positive response, order effects cause study power to increase under a linear probability (risk difference) model but decrease under a logistic model. The results suggest that measures designed to reduce survey order effects might have unintended consequences. We present a data example that demonstrates the problem of survey order effects.

Brian L. Egleston, Ph.D., Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA ~ March 29, 2012

Some Explorations of the Kaplan-Meyer Estimator in the Presence of Immunes:

When some portion of a population will never suffer the event whose occurrence is being recorded, the censored records contain two types of population members: those who are susceptible to the event but whose lifetime observation is censored, and those whose lifetime is essentially infinite. We discuss some possible modifications of the Kaplan-Meyer estimate to deal with this situation. Some improvement in performance is possible in this case.

Michael Tortorella, Professor, Industrial and Systems Engineering Department, Rutgers University, New Brunswick, NJ 08854 ~ April 12, 2012