Opportunities in Applied
Finance:
One of the hottest areas in business research is applied finance. There are many academics and
practitioners interested in understanding how and why money moves as it
does. Financial managers and
hedge funds pay top dollar for analysts that can help them to gain a
competitive advantage. There has also been an explosion of data and
opportunities for new analysis. We will discuss some current topics and
opportunities for collaboration.
Michael
Ehrlich, Ph.D., Department of
Finance, New Jersey Institute of Technology ~ January 19, 2012
|
Diagnostic accuracy studies
and spectrum bias:
The meta-analysis of diagnostic accuracy studies has assumed that
random variation can explain the variability in sensitivity and
specificity. Although sensitivity
and specificity are defined with respect to patient characteristics,
empirical studies have noted variability associated to disease
prevalence. Using elementary
probability theory, we present a model of specificity for a three stage
disease process, and show how specificity may be influenced by the
distribution of stage-specific disease entities.
Amy Davidow, Associate Professor, Department of Preventive
Medicine & Community Health, University of Medicine and Dentistry of New
Jersey, Newark ~ February 16, 2012
|
Accounting for Spatial
Correlation in the Scan Statistic:
The spatial scan statistic has been widely used in
epidemiology as a tool to identify hotspots or clusters in the occurrence of
diseases. An underlying assumption of the procedure is the independence of
disease incidence between different locations. However, it is sometimes more
realistic (e.g. for contagious diseases) to have correlation in the disease
counts of neighboring regions. Using a simulation study and a data example,
we show that when spatial correlation or overdispersion
is present, the spatial scan statistic identifies significant disease
clusters too frequently. We relate this issue of excessive false positives to
similar work in Efron (2004; 2007). Finally, we
introduce a simple procedure to obtain modified p values of identified
clusters that account for observed correlation in the data. We show how this
procedure helps to reduce the number of false positives, and how it changes
the results of our data analyses.
Ji
Meng Loh, PhD., AT&T
Labs - Research, Florham Park,
NJ ~ February 22, 2012
|
Flexible models and methods
for longitudinal and multilevel functional data:
In the first work, we propose penalized spline-based methods for functional mixed effects models
with varying coefficients. We decompose longitudinal outcomes as a sum of
several terms: a population mean function,
covariates with time-varying coefficients, functional random subject-specific
curves and residual measurement error processes. Using penalized splines, we propose nonparametric estimation of the
population mean function, varying-coefficient, random subject-specific
deviations and the associated covariance function which represents
between-subject variation and the variance function of the residual measurement
errors which represents within-subject variation. Decomposing variability of the
outcomes as a between-subject and a within-subject source is useful in
identifying the dominant variance component therefore optimally model a
covariance function. Furthermore, we study the asymptotic behavior of the
baseline P-spline estimator. The benefit of the
between- and within-subject covariance decomposition is illustrated through
an analysis of Berkeley
growth data where we identified clearly distinct patterns of the between- and
within-subject covariance functions of children's heights.
This second work is motivated by a clinical study where
patients undergo multiple 4-hour treatment cycles and within each treatment
cycle, repeated measurements of subjects' vital signs are recorded. This data
has a natural multilevel structure with treatment cycles nested within
subjects and measurements nested within cycles. Most literature on
nonparametric analysis of such multilevel functional data focus on
conditional approaches using functional mixed effects models. However,
parameters obtained from the conditional models do not have direct
interpretations as population average effects. When population effects are of
interest, we may employ marginal regression models. In this work, we propose
marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling
multilevel correlated categorical outcomes as well as continuous outcomes
without suffering from numerical difficulties. We provide a new variance
estimator robust to misspecification of correlation structure. Finally, we
apply the methods to the SAH study to evaluate a recent debate on
discontinuing the use of Nimodipine in the clinical
community.
Huaihou Chen, Ph.D.,
Department of Biostatistics, Mailman School of Public Health, Columbia
University, NY 10032 ~ February
23, 2012
|
Bayesian Hierarchical Models
with Dynamic Structures and Latent Variables: Methodology and Applications:
Mathematical and statistical modeling offer different
insights to understand environmental and biological systems. Often it is
desirable to combine the two approaches in a unified model. A statistical
model with a rich mathematical structure offers great advantages in terms of
flexibility and parameter interpretation, but the estimation process can be
rather challenging. Bayesian hierarchical models are in many instances an
ideal framework to overcome some of these challenges. We propose some
methodological innovations that involve the use of stochastic differential
equations for spatio temporal data. Examples with
both synthetic and real data are provided. Particular focus will be given to
applications to the Sugar Cane Yellow Leaf virus spread and to wireless
sensor data. Both examples will also illustrate other challenges that are
often encountered in environmental data analysis: identification, missing
observations, and data collection design. The last part of the talk is
dedicated to show other ongoing research projects.
Gavino Puggioni, Ph.D.,
Real Lab, Biology Department, Emory University, Atlanta, GA 30033~ February 29, 2012
|
A framework for joint modeling
and joint assessment of efficacy and safety endpoints for
probability of success evaluation and optimal dose selection:
Keywords: joint modeling; joint evaluation; Bayesian method;
probability of success; MCMC; clinical utility index
The evaluation of clinical proof of concept, optimal dose
selection, and phase III probability of success has traditionally been
conducted by a subjective and qualitative assessment of the efficacy and
safety data. This, in part, was responsible for the numerous failed phase III
programs in the past. The need to utilize more quantitative approaches to
assess efficacy and safety profiles has never been greater. In this paper, we
propose a framework that incorporates efficacy and safety data simultaneously
for the joint evaluation of clinical proof of concept, optimal dose
selection, and phase III probability of success. Simulation studies were
conducted to evaluate the properties of our proposed methods. The proposed
approach was applied to two real clinical studies. On the basis of the true
outcome of the two clinical studies, the assessment based on our proposed
approach suggested a reasonable path forward for both clinical programs.
Weili He, Ph.D., Director, Biostatistics and
Research Decision Sciences, Merck Sharp & Dohme
Corporation, Rahway, NJ 08889 ~ March 1, 2012
|
Valid
Post-Selection Inference:
It is common practice in statistical data analysis to
perform data-driven model selection and derive statistical inference from the
selected model as if this model was known in advance. Such inference is generally
invalid. We propose to produce valid “post-selection inference”
by reducing the problem to one of simultaneous inference. Simultaneity is
required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity
insurance” for all possible submodels, the
resulting post-selection inference is rendered universally valid under all
possible model selection procedures. This inference is therefore generally
conservative for particular selection procedures, but it is always more
precise than full Scheffe protection. We describe
the structure of the simultaneous inference problem and give some asymptotic
results. We also develop an algorithm for numerical computation for the width
of our new confidence intervals.
Kai Zhang, Department of Statistics, The
Wharton School, University of Pennsylvania, Philadelphia, PA 19104 ~ March 8,
2012
|
Optimal Multiple Testing Procedure Incorporating Signal Strength:
This research focuses on incorporating signal strength into a multiple
testing procedure using a decision theoretic approach. Specifically, we
propose a general loss function which incorporates the severity of Type II
errors. We also propose error rates that incorporate signal strength: the
weighted marginal false discovery rate and the weighted marginal false nondiscovery rate. We then derive optimal procedure which
minimizes the weighted marginal false nondiscovery
rate, while controlling the weighted marginal false discovery rate. The
numerical studies show that, by incorporating signal strength, much power can
be gained.
Dr. Li He, Department of Statistics, Temple University,
Philadelphia, PA, 19122
~ March 20, 2012
|
Estimation of treatment
effect for the sequential parallel design:
The sequential parallel clinical trial is a novel clinical trial
design being used in psychiatric diseases that are known to have potentially
high placebo response rates. The design consists of an initial parallel trial
of placebo versus drug augmented by a second parallel trial of placebo versus
drug in the placebo non-responders from the initial trial. Statistical
research on the design has focused on hypothesis tests. However, an equally
important output from any clinical trial is the estimate of treatment effect
and variability around that estimate. In the sequential parallel trial, the
most important treatment effect is the effect in the overall population. This
effect can be estimated by considering only the first phase of the trial, but
this ignores useful information from the second phase of the trial. We
develop estimates of treatment effect that incorporate data from both phases
of the trial. Our simulations and a real data example suggest that there can
be substantial gains in precision by incorporating data from both phases. The
potential gains appear to be greatest in moderate-sized trials, which would
typically be the case in phase II trials.
Xiaohong Huang, Ph.D.,
Principal Statistician, Biostatistics & Programming Sanofi,
Bridgewater, NJ 08807 ~ March 22, 2012
|
The impact of misclassification due to survey response
fatigue on estimation and identifiability of
treatment effects:
Response fatigue can cause measurement error and misclassification
problems in survey research. Questions asked later in a long survey are often
prone to more measurement error or misclassification. The response given is a
function of both the true response and participant response fatigue. We
investigate the identifiability of survey order
effects and their impact on estimators of treatment effects. The focus is on
fatigue that affects a given answer to a question rather than fatigue that
causes non-response and missing data. We consider linear, Gamma, and logistic
models of response that incorporate both the true underlying response and the
effect of question order. For continuous data, survey order effects have no
impact on study power under a Gamma model. However, under a linear model that
allows for convergence of responses to a common mean, the impact of fatigue
on power will depend on how fatigue affects both the rate of mean convergence
and the variance of responses. For binary data and for less than a 50% chance
of a positive response, order effects cause study power to increase under a
linear probability (risk difference) model but decrease under a logistic
model. The results suggest that measures designed to reduce survey order
effects might have unintended consequences. We present a data example that
demonstrates the problem of survey order effects.
Brian L. Egleston, Ph.D., Biostatistics
and Bioinformatics Facility, Fox Chase Cancer
Center, Philadelphia, PA
~ March 29, 2012
|
Some
Explorations of the Kaplan-Meyer Estimator in the Presence of Immunes:
When some portion of a population will never suffer the event
whose occurrence is being recorded, the censored records contain two types of
population members: those who are susceptible to the event but whose lifetime
observation is censored, and those whose lifetime is essentially
infinite. We discuss some
possible modifications of the Kaplan-Meyer estimate to deal with this
situation. Some improvement in
performance is possible in this case.
Michael Tortorella,
Professor, Industrial and Systems
Engineering Department, Rutgers University, New Brunswick, NJ 08854 ~ April 12, 2012
|