Statistics Seminar Series

Department of Mathematical Sciences
and
Center for Applied Mathematics and Statistics

New Jersey Institute of Technology

SPRING 2011

 

All seminars are 4:00 - 5:00 p.m., in Cullimore Hall Room 611 (Math Conference Room) unless noted otherwise. Refreshments are usually served at 3:30 p.m., and talks start at 4:00 p.m. If you have any questions about a particular seminar, please contact the person hosting the speaker.

 

Date

Speaker and Title

Host

Thursday
February 10, 2011
Cullimore 110
4:00PM

Yuanjia Wang, Ph.D., Department of Biostatistics, Mailman School of Public Health Columbia University

Flexible Semiparametric Analysis of Longitudinal Genetic Studies by Reduced Rank Smoothing

 (abstract)

 

 

 

Sunil Dhar


Thursday

February 17, 2011
Cullimore Lecture Hall I 
4:00PM

Haiyan Su, Ph.D., Department of Mathematical Sciences,
Montclair State University, Montclair, NJ
Semi-parametric Hybrid Empirical Likelihood Inference for Two-sample Comparison With Censored Data
 (abstract)

Sundar Subramanian

Thursday
February 24, 2011
4:00PM

Zhiqiang Tan, Ph.D., Department of Statistics, Rutgers University, Piscataway, NJ
Understanding and Improving Propensity Score Methods

(abstract

Sunil Dhar

Thursday
March 10, 2011
Cullimore Lecture Hall I
4:00PM

Qianxing Mo, Ph.D., Research biostatistician , Department of Epidemiology and
Biostatistics of Memorial Sloan-Kettering Cancer Center

A Fully Bayesian Hidden Ising Model for ChIP-seq Data Analysis
(abstract)

Wenge Guo

Thursday

March 24, 2011
Cullimore Lecture Hall I

4:00PM

Haiyan Xu, Ph.D.,  Clinical Biostatistics, Johnson & Johnson Pharmaceutical Research & Development

Parallel Gatekeeping Procedures

(abstract)

Sunil Dhar

Thursday

March 31, 2011

4:00PM

Sunil K. Dhar, Ph.D., Department of Mathematical Sciences and the Center for Applied Mathematics and Statistics, New Jersey Institute of Technology
Generalized Linear Model under the Inverse Sampling Scheme
(abstract)

 

 

 

Thursday

April 14, 2011
Cullimore Lecture Hall I

4:00PM

 

Wenge Guo, Ph.D., Department of Mathematical Sciences and the Center for Applied Mathematics and Statistics, New Jersey Institute of Technology
Adaptive FWER and FDR Control under Block Dependence
(abstract)

Sunil Dhar

Thursday

April 21, 2011

4:00PM

 

Shuangge (Steven) Ma, Ph.D., Yale School of Public Health
Integrative Analysis of Cancer Genomic Data
(abstract)

Sunil Dhar

Thursday

April 28, 2011

4:00PM

 

 

Dirk F Moore, Ph.D., Department of Biostatistics, UMDNJ School of Public Health and Biometrics Division, Cancer Institute of New Jersey

What is the best way to manage prostate cancer? Causal inference using a geographical instrumental variable with the SEER/Medicare cohort.
(abstract)

 

    Aridaman Jain

 

 

 

 

 

 

 

 

 

ABSTRACTS


Flexible Semiparametric Analysis of Longitudinal Genetic Studies by Reduced Rank Smoothing:

In family-based longitudinal genetic studies, investigators collect repeated measurements on a trait that changes with time along with genetic markers. Since repeated measurements are nested within subjects and subjects are nested within families, both the subject-level and measurement-level correlations must be taken into account in the statistical analysis to achieve more accurate estimation. In such studies, the primary interests include to test for quantitative trait locus (QTL) effect, and to estimate age-specific QTL effect and residual polygenic heritability function.  We propose flexible semiparametric models along with their statistical estimation and hypothesis testing procedures for longitudinal genetic designs. We employ penalized splines to estimate nonparametric functions in the models. We find that misspecifying the baseline function or the genetic effect function in a parametric analysis may lead to substantially inflated or highly conservative type I error rate on testing and large mean squared error on estimation. We apply the proposed approaches to examine age-specific effects of genetic variants reported in a recent genome-wide association study of blood pressure collected in the Framingham Heart Study.

Assistant Professor Yuanjia Wang, Department of Biostatistics, Mailman School of Public Health Columbia University  ~ February 10, 2011
 

Semi-parametric Hybrid Empirical Likelihood Inference for Two-sample Comparison With Censored Data:

Two-sample comparison problems are often encountered in practical projects and have widely been studied in literature. Owing to practical demands, the research for this topic under special settings such as a semiparametric framework have also attracted great attentions. Zhou and Liang (2005) proposed an empirical likelihood-based semi-parametric inference for the comparison of treatment effects in a two-sample problem with censored data. However, their approach is actually a pseudo-empirical likelihood and the method may not be fully efficient. In this study, we develop a new empirical likelihood-based inference under more general framework by using the hazard formulation of censored data for two sample semi-parametric hybrid models. We demonstrate that our empirical likelihood statistic converges to a standard chi-squared distribution under the null hypothesis. We further illustrate the use of the proposed test by testing the ROC curve with censored data, among others. Numerical performance of the proposed method is also examined.

Assistant Professor Haiyan Su, Department of Mathematical Sciences, Montclair State University, Montclair, NJ ~ February 17, 2011
 


Understanding and Improving Propensity Score Methods:

Consider estimating the mean of an outcome in the presence of missing data or estimating population average treatment effects in causal inference. A doubly robust estimator remains consistent if an outcome regression model or a propensity score model is correctly specified. We build on a previous nonparametric likelihood approach and propose newdoubly robust estimators, which have desirable properties in efficiency if the propensity score model is correctly specified, and in boundedness even if the inverse probability weights are highly variable. We compare the new and existing estimators in a simulation study and find that the robustified likelihood estimators yield overall the smallest mean squared errors.

Associate Professor Zhiqiang Tan, Department of Statistics, Rutgers University, Piscataway, NJ ~ February 24, 2011
 

A Fully Bayesian Hidden Ising Model for ChIP-seq Data Analysis:

Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a powerful technique that is being used in a wide range of biological studies including genome-wide measurements of protein-DNA interactions, DNA methylation and histone modification. The vast amount of data and biases introduced by sequencing and/or genome mapping pose new challenges, and call for effective methods and fast computer programs for statistical analysis. To systematically model ChIP-seq data, we build a dynamic signal profile for each chromosome, and then model the profile using a fully Bayesian hidden Ising model. The proposed model naturally takes into account spatial dependency and global and local distributions of sequence tags. It can be used for one-sample and two-sample analyses. Through model diagnosis, the proposed method can detect falsely enriched regions caused by sequencing and/or mapping errors, which is usually not offered by the existing hypothesis-testing-based methods. The proposed method is illustrated using three transcription factor ChIP-seq data sets and four mixed ChIP-seq data sets, and compared with four popular and/or well-documented methods: MACS, CisGenome, BayesPeak and SISSRs. The results indicate that the proposed method achieves equivalent or higher sensitivity and spatial resolution in detecting transcription factor binding sites with false discovery rate at a much lower level.

Qianxing Mo, PhD, Research biostatistician , Department of epidemiology and biostatistics of Memorial Sloan-Kettering Cancer Center ~ March 10, 2011
 


Parallel Gatekeeping Procedures:

Keywords: hierarchically ordered objectives, multiple endpoints, strong control of type I error, gatekeeping strategy, serial gatekeeping, parallel gatekeeping.

It is becoming increasingly common to consider designs with multiple endpoints, analyses and objectives in registration trials. This presentation will cover recent developments in the area of novel testing strategies (gatekeeping) for hierarchically ordered objectives. The analyses are often related to multiple endpoints but can also represent dose-control comparisons, noninferiority and superiority tests or inferences at different time points. This presentation will cover serial, parallel, tree-structured gatekeeping, with focus on parallel gatekeeping procedures. Weighted-Bonferroni-based parallel gatekeeping procedure (and its extensions), Dunnett-based parallel gatekeeping procedure, and Dunnett-Bonferroni-based parallel gatekeeping procedure will be discussed in detail. If time allows, stepwise version of weighted Bonferroni-based procedure, general multi-stage gatekeeping, and partitioning decision paths will also be discussed.

Haiyan Xu, Ph.D.,
Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 1125 Trenton-Harbourton Road, Titusville, NJ 08560 ~ March 24, 2011

Generalized Linear Model under the Inverse Sampling Scheme:

The generalized linear model for multi-way contingency table with cell values represent the frequency counts that follow an extended negative multinomial distribution is introduced. This is an extension of negative multinomial log-linear model. The parameters of the new model are estimated and test for the covariate parameters derived. An application of the log-linear model under the generalized inverse sampling scheme is demonstrated by an example.

Sunil K. Dhar, PhD, Associate Professor, Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ 07102 ~ March 31, 2011
 


Adaptive FWER and FDR Control under Block Dependence:

Recently, a number of adaptive multiple testing procedures have been proposed. However, in a non-asymptotic setting, the FWER or FDR control of these adaptive procedures is proved only under independence, although extensive simulation studies have suggested that these procedures perform well under certain types of dependence structure. In this talk, several variants of the conventional adaptive Bonferroni and Benjamini-Hochberg methods will be presented, along with proofs that these procedures provide ultimate control of the FWER or FDR under block dependence. Results of simulation studies comparing the performances of these adaptive procedures with the conventional FWER and FDR controlling procedures under dependence will also be presented.

Wenge Guo, PhD, Assistant Professor, Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ 07102 ~ April 14, 2011
 

 

Integrative Analysis of Cancer Genomic Data:


In cancer genomic studies, markers identified from the analysis of single datasets often suffer from a lack of reliability because of the small sample sizes. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple datasets is challenging because of the high dimensionality of markers and, more importantly, because of the heterogeneity among studies. We consider penalized approaches for marker selection in the integrative analysis of multiple datasets. The proposed approaches can effectively identify markers with consistent effects across multiple studies and automatically accommodate the heterogeneity among studies. We establish the asymptotic consistency properties, conduct simulations, and analyze pancreatic and liver cancer studies.

 

Shuangge (Steven) MA, PhD, Assistant Professor of Public Health (Biostatistics), Yale School of Public Health, New Haven, CT 06520-8034 ~ April 21, 2011
 

 

What is the best way to manage prostate cancer? Causal inference using a geographical instrumental variable with the SEER/Medicare cohort:

 

Knowing which of competing treatment options is best for prostate cancer is of critical interest to patients and physicians  Population-based cohorts such as the SEER/Medicare linked database provide a wealth of information on treatments and outcomes for prostate cancer, as well as a range of potential confounder covariates, allowing one to find associations between treatment and outcome.  However, the presence of unknown or unmeasured confounders presents difficulties in inferring causative relationships from observed associations. The classical way to establish causation is through randomized clinical trials, but these are often difficult to set up using elderly patients.  Instrumental variable analysis (IVA) is an alternative method for observational data and that captures many of the advantages that randomized trials offer for inferring causation.  In this talk I show how to construct a geographically-based instrumental variable in the SEER/Medicare database to compare the effectiveness hormone therapy to active surveillance in elderly prostate cancer patients.  I put this in context by discussing other examples of IVA, and describe future methodological directions.


Dirk F Moore, PhD, Department of Biostatistics, UMDNJ School of Public Health and Biometrics Division, Cancer Institute of New Jersey ~ April 28, 2011