Flexible Semiparametric Analysis of Longitudinal
Genetic Studies by Reduced Rank Smoothing:
In family-based longitudinal genetic studies, investigators collect
repeated measurements on a trait that changes with time along with genetic
markers. Since repeated measurements are nested within subjects and subjects
are nested within families, both the subject-level and measurement-level correlations
must be taken into account in the statistical analysis to achieve more
accurate estimation. In such studies, the primary interests include to test
for quantitative trait locus (QTL) effect, and to estimate age-specific QTL
effect and residual polygenic heritability function. We propose
flexible semiparametric models along with their statistical estimation and
hypothesis testing procedures for longitudinal genetic designs. We employ
penalized splines to estimate nonparametric functions in the models. We find
that misspecifying the baseline function or the genetic effect function in a
parametric analysis may lead to substantially inflated or highly conservative
type I error rate on testing and large mean squared error on estimation. We
apply the proposed approaches to examine age-specific effects of genetic
variants reported in a recent genome-wide association study of blood pressure
collected in the Framingham Heart Study.
Assistant Professor Yuanjia Wang,
Department of Biostatistics, Mailman School of Public Health Columbia
University ~ February 10, 2011
|
Semi-parametric Hybrid Empirical Likelihood Inference
for Two-sample Comparison With Censored Data:
Two-sample comparison problems are often encountered in practical projects
and have widely been studied in literature. Owing to practical demands, the
research for this topic under special settings such as a semiparametric
framework have also attracted great attentions. Zhou and Liang (2005)
proposed an empirical likelihood-based semi-parametric inference for the
comparison of treatment effects in a two-sample problem with censored data.
However, their approach is actually a pseudo-empirical likelihood and the
method may not be fully efficient. In this study, we develop a new
empirical likelihood-based inference under more general framework by using
the hazard formulation of censored data for two sample semi-parametric hybrid
models. We demonstrate that our empirical likelihood statistic converges to a
standard chi-squared distribution under the null hypothesis. We further
illustrate the use of the proposed test by testing the ROC curve with
censored data, among others. Numerical performance of the proposed method is
also examined.
Assistant
Professor Haiyan Su, Department of Mathematical Sciences, Montclair State
University, Montclair, NJ ~ February 17, 2011
|
Understanding and Improving Propensity Score
Methods:
Consider estimating the mean of
an outcome in the presence of missing data or estimating population average
treatment effects in causal inference. A doubly robust estimator remains
consistent if an outcome regression model or a propensity score model is
correctly specified. We build on a previous nonparametric likelihood approach
and propose newdoubly robust estimators, which have desirable properties in
efficiency if the propensity score model is correctly specified, and in
boundedness even if the inverse probability weights are highly variable. We
compare the new and existing estimators in a simulation study and find that
the robustified likelihood estimators yield overall the smallest mean squared
errors.
Associate
Professor Zhiqiang Tan, Department of Statistics, Rutgers University,
Piscataway, NJ ~ February 24, 2011
|
A Fully
Bayesian Hidden Ising Model for ChIP-seq Data Analysis:
Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq)
is a powerful technique that is being used in a wide range of biological
studies including genome-wide measurements of protein-DNA interactions, DNA
methylation and histone modification. The vast amount of data and biases
introduced by sequencing and/or genome mapping pose new challenges, and call
for effective methods and fast computer programs for statistical analysis.
To systematically model ChIP-seq data, we build a dynamic signal profile for
each chromosome, and then model the profile using a fully Bayesian hidden
Ising model. The proposed model naturally takes into account spatial
dependency and global and local distributions of sequence tags. It can be
used for one-sample and two-sample analyses. Through model diagnosis, the
proposed method can detect falsely enriched regions caused by sequencing
and/or mapping errors, which is usually not offered by the existing
hypothesis-testing-based methods. The proposed method is illustrated using
three transcription factor ChIP-seq data sets and four mixed ChIP-seq data
sets, and compared with four popular and/or well-documented methods: MACS,
CisGenome, BayesPeak and SISSRs. The results indicate that the proposed
method achieves equivalent or higher sensitivity and spatial resolution in
detecting transcription factor binding sites with false discovery rate at a
much lower level.
Qianxing Mo, PhD,
Research biostatistician , Department of epidemiology and
biostatistics of Memorial Sloan-Kettering Cancer Center ~ March 10, 2011
|
Parallel Gatekeeping
Procedures:
Keywords: hierarchically
ordered objectives, multiple endpoints, strong control of type I error,
gatekeeping strategy, serial gatekeeping, parallel gatekeeping.
It is becoming increasingly common to consider designs with multiple endpoints,
analyses and objectives in registration trials. This presentation will cover
recent developments in the area of novel testing strategies (gatekeeping) for
hierarchically ordered objectives. The analyses are often related to multiple
endpoints but can also represent dose-control comparisons, noninferiority and
superiority tests or inferences at different time points. This presentation
will cover serial, parallel, tree-structured gatekeeping, with focus on
parallel gatekeeping procedures. Weighted-Bonferroni-based parallel
gatekeeping procedure (and its extensions), Dunnett-based parallel
gatekeeping procedure, and Dunnett-Bonferroni-based parallel gatekeeping
procedure will be discussed in detail. If time allows, stepwise version of
weighted Bonferroni-based procedure, general multi-stage gatekeeping, and
partitioning decision paths will also be discussed.
Haiyan Xu, Ph.D., Johnson &
Johnson Pharmaceutical Research & Development, L.L.C., 1125
Trenton-Harbourton Road, Titusville, NJ 08560 ~ March 24, 2011
|
Generalized Linear Model under the Inverse Sampling
Scheme:
The generalized linear model for multi-way contingency table with cell
values represent the frequency counts that follow an extended negative
multinomial distribution is introduced. This is an extension of negative
multinomial log-linear model. The parameters of the new model are estimated
and test for the covariate parameters derived. An application of the
log-linear model under the generalized inverse sampling scheme is
demonstrated by an example.
Sunil K. Dhar, PhD, Associate Professor, Department of Mathematical
Sciences, New Jersey Institute of Technology, Newark, NJ 07102 ~ March 31,
2011
|
Adaptive FWER
and FDR Control under Block Dependence:Recently, a
number of adaptive multiple testing procedures have been proposed. However,
in a non-asymptotic setting, the FWER or FDR control of these adaptive
procedures is proved only under independence, although extensive simulation
studies have suggested that these procedures perform well under certain
types of dependence structure. In this talk, several variants of the
conventional adaptive Bonferroni and Benjamini-Hochberg methods will be
presented, along with proofs that these procedures provide ultimate control
of the FWER or FDR under block dependence. Results of simulation studies
comparing the performances of these adaptive procedures with the
conventional FWER and FDR controlling procedures under dependence will also
be presented.
Wenge Guo, PhD, Assistant Professor, Department of Mathematical
Sciences, New Jersey Institute of Technology, Newark, NJ 07102 ~
April 14,
2011
|
Integrative Analysis of Cancer Genomic Data:
In cancer genomic studies, markers identified from the analysis of
single datasets often suffer from a lack of reliability because of the small
sample sizes. A cost-effective remedy is to pool data from multiple
comparable studies and conduct integrative analysis. Integrative analysis of
multiple datasets is challenging because of the high dimensionality of
markers and, more importantly, because of the heterogeneity among studies. We
consider penalized approaches for marker selection in the integrative
analysis of multiple datasets. The proposed approaches can effectively
identify markers with consistent effects across multiple studies and
automatically accommodate the heterogeneity among studies. We establish the
asymptotic consistency properties, conduct simulations, and analyze
pancreatic and liver cancer studies.
Shuangge (Steven) MA, PhD,
Assistant Professor of Public Health (Biostatistics), Yale School of Public
Health, New Haven, CT 06520-8034 ~ April 21, 2011
|
What is the best way to manage prostate cancer?
Causal inference using a geographical instrumental variable with the
SEER/Medicare cohort:
Knowing which of competing treatment options is best
for prostate cancer is of critical interest to patients and physicians
Population-based cohorts such as the SEER/Medicare linked database provide a
wealth of information on treatments and outcomes for prostate cancer, as
well as a range of potential confounder covariates, allowing one to find
associations between treatment and outcome. However, the presence of
unknown or unmeasured confounders presents difficulties in inferring
causative relationships from observed associations. The classical way to
establish causation is through randomized clinical trials, but these are
often difficult to set up using elderly patients. Instrumental variable
analysis (IVA) is an alternative method for observational data and that
captures many of the advantages that randomized trials offer for inferring
causation. In this talk I show how to construct a geographically-based
instrumental variable in the SEER/Medicare database to compare the
effectiveness hormone therapy to active surveillance in elderly prostate
cancer patients. I put this in context by discussing other examples of IVA,
and describe future methodological directions.
Dirk F Moore, PhD, Department of
Biostatistics, UMDNJ School of Public Health and Biometrics Division, Cancer
Institute of New Jersey ~ April 28, 2011
|