DMS Statistics Seminar Series

Statistics Seminar Series

Department of Mathematical Sciences
and
Center for Applied Mathematics and Statistics

New Jersey Institute of Technology

Fall 2013

All seminars are 4:00 - 5:00 p.m. on Thursdays, in Cullimore Hall Room 611 (Math Conference Room) unless noted otherwise. If you have any questions about a particular seminar, please contact the person hosting the speaker.

Date	Speaker and Title	Host
Thursday September 26, 2013 4:00PM	Yixin Fang, Ph.D., NYU Langone Medical Center A New Criterion for Variable Selection in High Dimensional Problems (abstract )	Antai Wang
Thursday October 3, 2013 4:00PM	Guosheng Yin, Ph.D., the University of Hong Kong Corrected Score Approach to Censored Quantile Regression with Covariate Measurement Errors (abstract)	Antai Wang
Thursday October 10, 2013 4:00PM	George C. Mytalas, PhD., Department of Mathematical Sciences New Jersey Institute of Technology Analysis of different M/G/1 batch arrival queues subject to disasters (abstract)	Wenge Guo
Thursday October 17, 2013 4:00PM	Yajuan Si, PhD., Department of Statistics, Columbia University Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys (abstract)	Antai Wang
Thursday October 24, 2013 4:00PM	Shoubhik Mondal, Department of Mathematical Sciences New Jersey Institute of Technology Confidence bands for survival functions in Cox regression framework (abstract)	Wenge Guo
Thursday October 31, 2013 4:00PM	Sandra M. Hurtado Rúa, PhD., Cornell University A Transformation Class for Spatio-temporal Survival Data with a cure fraction (abstract)	Sunil Dhar
Thursday Nov 7, 2013 4:00PM	Sonia Bandha, Department of Mathematical Sciences New Jersey Institute of Technology Copula-based modeling & Computational Solutions of Warranty Cost Management Problems (abstract)	Wenge Guo
Tuesday Nov 14, 2013 4:00PM	Yuping Zhang, PhD., Yale University Principal Trend Analysis with Application to Time-course Genomic Data (abstract)	Antai Wang
Thursday Nov 21, 2013 4:00PM	Ian McKeague, PhD., Columbia University Is there a needle in the haystack? ART and non-standard asymptotics (abstract)	Antai Wang
Thursday Dec, 5, 2013 4:00PM	Zhengqin Ouyang, PhD., the Jackson Laboratory for Genomic Medicine, University of Connecticut Reconstruct the RNA structurome from sequencing data (abstract)	Antai Wang

ABSTRACTS

A new criterion for variable selection in high-dimensional problems

High dimensional statistical problems arise in many fields such as medical studies, genomics, finance, and machine learning. Variable selection plays an important role in high dimensional statistical analysis, and many criteria have been proposed in statistical literature. In this talk I will discuss a newly proposed criterion, which combines the strength of both prediction selection and stability selection and therefore is referred as the prediction and stability selection (PASS). The selection consistency is established and the effectiveness is demonstrated through simulation studies and an application to the prostate cancer data.

Yixin Fang, Ph.D., Division of Biostatistics, New York University School of Medicine ~September, 26, 2013

Corrected Score Approach to Censored Quantile Regression with Covariate Measurement Errors:

Censored quantile regression has become an important alternative to the Cox proportional hazards model in survival analysis. In contrast to the central covariate effect from the mean-based hazard regression, quantile regression can effectively characterize the covariate effects at different quantiles of the survival time. When covariates are measured with errors, it is known that naively treating mismeasured covariates as error-free would result in estimation bias. Under censored quantile regression, we propose corrected estimating equations to obtain consistent estimators. We establish consistency and asymptotic normality for the proposed estimators of quantile regression coefficients. Compared with the naive estimator, the proposed method can eliminate the estimation bias under various measurement error distributions and model error distributions. We conduct simulation studies to examine the finite-sample properties of the new method and apply our model to a lung cancer study.

Guosheng Yin, Associate Professor, Department of Statistics and Actuarial Science , The University of Hong Kong~ October 7, 2013

Analysis of different M/G/1 batch arrival queues subject to disasters:

In this talk we present two queueing systems with batch Poisson arrivals subject to disasters. Queues with disasters are natural models for applications in communications systems or manufacturing which are subject to catastrophic failures which result in restarting the system after loosing all customers currently in service, or waiting to be served. When a disaster occurs according to a Poisson process the system is cleared of all customers and the server initiates a repair period. During the repair period arriving batches of customers accumulate in the queue without receiving service. In first model, whenever server becomes empty server takes a string of vacations of random length during which any customers that may have arrived wait without receiving service. Vacation lengths are i.i.d. random variables whereas the number of vacations in each string is also random. A string of vacations ends when either customers arrive or when it reaches its length. In second model, after completion of the service customer can immediately join the tail of the queue as a feedback customer for receiving another service with probability r. Otherwise the customer may depart forever from the system with probability 1-r. When the system becomes empty takes vacations single or multiple. We analyze both systems using the supplementary variables technique and we obtain the probability generating function of the stationary queue length distribution, the sojourn time of the typical customer in the stationary regime, and the distribution of the busy period.

George C. Mytalas, Department of Mathematical Sciences, New Jersey Institute of Technology~ October, 10, 2012

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys:

In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions. The approach automatically models complex dependencies while being computationally expedient. The Dirichlet process prior distributions enable analysts to avoid fixing the number of mixture components at an arbitrary number. We illustrate repeated sampling properties of the approach using simulated data. We apply the methodology to impute missing background

data in the 2007 Trends in International Mathematics and Science Study.

Yajuan Si, Department of Statistics, Columbia University, Joint work with Jerome P. Reiter, Department of Statistical Science, Duke University, Durham~ October, 17, 2013

A Transformation Class for Spatio-temporal Survival Data with a cure fraction:

A hierarchical Bayesian methodology to model spatio-temporal clustered survival data with possibility of cure is proposed. A continuous transformation class of survival curves indexed by a single parameter is used. This transformation model is a larger class of models containing as special cases two of the well-known existing models, the Proportional Hazard (PH) and the Proportional Odds (PO) models. The survival curve is modeled as a function of a baseline cumulative distribution function (cdf), cure rates and spatio-temporal frailties. The cure rates model uses a covariate link specification and the spatial frailties follow a Conditionally Autoregressive model (CAR) with time varying parameters. The likelihood function is formulated assuming that the single parameter controlling the transformation is unknown and full conditional distributions are derived. A model with a non-parametric baseline cdf is implemented. We obtain the usual posterior estimates, smoothed by regional level maps. Finally, we apply our methodology to a SEER’s data set of melanoma cancer patients diagnosed in the state of New Jersey between 2000 and 2007, and with follow up time until 2007.

Sandra M. Hurtado Rúa, PhD., Division of Biostatistics and Epidemiology, Department of Public Health, Cornell University~ October, 31, 2013

Confidence bands for survival functions in Cox regression framework

Cox regression combined with semiparametric random censorship models provides a powerful framework for obtaining improved parameter estimates (Mondal and Subramanian, 2014). Here we exploit this methodology to construct several new simultaneous

confidence bands (SCBs) for subject-specific survival curves. Simulation results are presented to compare the performance of the proposed SCBs with competing ones that are based only on standard Cox. The new SCBs provide correct empirical coverage and

are more informative. The proposed methods provide easy extensions for the case when censoring indicators may be missing for a subset of study subjects.

Shoubhik Mondal, PhD Candidate in Statistics, Department of Mathematical Sciences, New Jersey Institute of Technology, October, 24, 2013

Principal Trend Analysis with Application to Time-course Genomic Data

We present principal trend analysis, a new framework for computing a low-rank approximation for a group of matrices with time-course structure. This low-rank approximation is a generalization of the principal component analysis. While principal component analysis does not consider the time-course structure of data, our penalized decomposition approach yields a smooth representation of the principal components, named principal trends. Moreover, the new decomposition can result in sparse factor loadings, which facilitate the interpretation. The method is demonstrated by simulations and on real time-course genomic data sets.

Yuping Zhang, PhD, Department of Biostatistics, Yale University, Nov, 14, 2013

Is there a needle in the haystack? ART and non-standard asymptotics

This talk discusses marginal screening for detecting the presence of significant predictors in high-dimensional regression (is there a needle in the haystack?). Screening large numbers of predictors is a challenging problem due to the non-standard limiting behavior of post-model-selected estimators. There is a common misconception that the oracle property for such estimators is a panacea, but the oracle property only holds away from the null hypothesis of interest in marginal screening. To address this difficulty, we propose an adaptive resampling test (ART). Our approach provides an alternative to the popular (yet conservative) Bonferroni method of controlling familywise error rates. ART is adaptive in the sense that thresholding is used to decide whether the centered percentile bootstrap applies, and otherwise adapts to the non-standard asymptotics in the tightest way possible. The talk is based on joint work with Min Qian.

Ian McKeague, PhD, Department of Biostatistics, Columbia University, Nov, 21,2013

Copula-based modeling & Computational Solutions of Warranty Cost Management Problems

Much recent research on modeling and optimization of servicing costs for Non-Renewing Free Replacement Warranties (NR-FRW) assumes that a consumers usage pro_le is constant and known. Such an assumption is unre-alistic for moderately high value consumer durables. In such cases, it would be pragmatic to assume that the manufacturer/seller is uncertain about any customers _xed usage rate of the product; the usage rate is modeled by a probability distribution of the usage for target customers. This research seeks to model and minimize the expected costs of pragmatic servicing strategies for NR- FRW warranties, using a Copula based approach to capture the adverse impact of increasing product usage rate on its time-to-failure. Since exact analytical solutions to these models are typically not obtainable, numerical methods using MATLAB and the Simulated Annealing algorithm for globally optimal cost minimization are used for computational solution. These methods and results are compared with those obtained from a well know benchmark numerical example and then new results are derived.

Sonia Bandha, Department of Mathematical Sciences, New Jersey Institute of Technology, Nov, 7, 2013

Reconstruct the RNA structurome from sequencing data

Next generation sequencing coupled with nuclease digestion is emerging to dissect the structures of thousands of RNAs (RNA structurome) simultaneously. However, it remains challenging to predict RNA secondary structures accurately. We present a novel approach, SeqFold, for the reconstruction of RNA structuome by statistical modeling of sequencing data. Using the known structures of a wide range of mRNAs and noncoding RNAs as benchmarks, we demonstrate that SeqFold is more accurate and robust than traditional approaches. The reconstructed landscape of RNA structurome shows widespread impacts on genome-wide post-transcriptional regulation.

Zhengqin Ouyang, PhD., the Jackson Laboratory for Genomic Medicine, University of Connecticut, Dec,5,2013