A new criterion for variable selection in
high-dimensional problems
High
dimensional statistical problems arise in many fields such as medical
studies, genomics, finance, and machine learning. Variable selection plays an
important role in high dimensional statistical analysis, and many criteria
have been proposed in statistical literature. In this talk I will discuss a
newly proposed criterion, which combines the strength of both prediction
selection and stability selection and therefore is referred as the prediction
and stability selection (PASS). The selection consistency is established and
the effectiveness is demonstrated through simulation studies and an
application to the prostate cancer data.
Yixin Fang, Ph.D., Division of Biostatistics, New York University
School of Medicine ~September, 26, 2013
|
Corrected Score Approach to
Censored Quantile Regression with Covariate
Measurement Errors:
Censored quantile regression has become
an important alternative to the Cox proportional hazards model in survival
analysis. In contrast to the central covariate effect from the mean-based
hazard regression, quantile regression can
effectively characterize the covariate effects at different quantiles of the survival time. When covariates are
measured with errors, it is known that naively treating mismeasured
covariates as error-free would result in estimation bias. Under censored quantile regression, we propose corrected estimating
equations to obtain consistent estimators. We establish consistency and
asymptotic normality for the proposed estimators of quantile
regression coefficients. Compared with the naive estimator, the proposed
method can eliminate the estimation bias under various measurement error
distributions and model error distributions. We conduct simulation studies to
examine the finite-sample properties of the new method and apply our model to
a lung cancer study.
Guosheng Yin, Associate Professor, Department of Statistics and
Actuarial Science , The University of Hong Kong~ October 7, 2013
|
Analysis of different M/G/1
batch arrival queues subject to disasters:
In
this talk we present two queueing systems with
batch Poisson arrivals subject to disasters. Queues with disasters are
natural models for applications in communications systems or manufacturing
which are subject to catastrophic failures which result in restarting the
system after loosing all customers currently in
service, or waiting to be served. When a disaster occurs according to a
Poisson process the system is cleared of all customers and the server
initiates a repair period. During the repair period arriving batches of
customers accumulate in the queue without receiving service. In first model,
whenever server becomes empty server takes a string of vacations of random
length during which any customers that may have arrived wait without
receiving service. Vacation lengths are i.i.d.
random variables whereas the number of vacations in each string is also
random. A string of vacations ends when either customers
arrive or when it reaches its length. In second model, after completion of
the service customer can immediately join the tail of the queue as a feedback
customer for receiving another service with probability r. Otherwise the
customer may depart forever from the system with probability 1-r. When the
system becomes empty takes vacations single or multiple. We analyze both
systems using the supplementary variables technique and we obtain the
probability generating function of the stationary queue length distribution,
the sojourn time of the typical customer in the stationary regime, and the
distribution of the busy period.
George C. Mytalas, Department
of Mathematical Sciences, New Jersey Institute of Technology~ October, 10,
2012
|
Nonparametric Bayesian
Multiple Imputation for Incomplete Categorical Variables in
Large-Scale Assessment Surveys:
In many surveys, the data comprise a large number of
categorical variables that suffer from item nonresponse. Standard methods for
multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and
can be difficult to implement effectively in high dimensions. We present a
fully Bayesian, joint modeling approach to multiple imputation
for categorical data based on Dirichlet process
mixtures of multinomial distributions. The approach automatically models
complex dependencies while being computationally expedient. The Dirichlet process prior distributions enable analysts to
avoid fixing the number of mixture components at an arbitrary number. We
illustrate repeated sampling properties of the approach using simulated data.
We apply the methodology to impute missing background
data in the 2007 Trends in
International Mathematics and Science Study.
Yajuan Si, Department of Statistics, Columbia
University, Joint work with Jerome P. Reiter, Department of Statistical
Science, Duke University, Durham~ October, 17, 2013
|
A Transformation Class for Spatio-temporal Survival Data with a cure fraction:
A hierarchical Bayesian
methodology to model spatio-temporal clustered
survival data with possibility of cure is proposed. A continuous
transformation class of survival curves indexed by a single parameter is
used. This transformation model is a larger class of models containing
as special cases two of the well-known existing models, the Proportional
Hazard (PH) and the Proportional Odds (PO) models. The survival curve
is modeled as a function of a baseline cumulative distribution function (cdf), cure rates and spatio-temporal
frailties. The cure rates model uses a covariate link specification and the
spatial frailties follow a Conditionally Autoregressive model (CAR) with time
varying parameters. The likelihood function is formulated assuming that the
single parameter controlling the transformation is unknown and full
conditional distributions are derived. A model with a non-parametric
baseline cdf is implemented. We obtain the usual
posterior estimates, smoothed by regional level maps. Finally, we apply our
methodology to a SEER’s data set of melanoma cancer patients diagnosed
in the state of New Jersey between 2000 and 2007, and with follow up time
until 2007.
Sandra M. Hurtado Rúa, PhD., Division of Biostatistics and
Epidemiology, Department of Public Health, Cornell University~ October,
31, 2013
|
Confidence bands for survival functions in Cox regression framework
Cox
regression combined with semiparametric random
censorship models provides a powerful framework for obtaining improved
parameter estimates (Mondal and Subramanian, 2014).
Here we exploit this methodology to construct several new simultaneous
confidence bands (SCBs) for subject-specific survival
curves. Simulation results are presented to compare the performance of the
proposed SCBs with competing ones that are based only on standard Cox. The
new SCBs provide correct empirical coverage and
are more informative. The proposed methods provide easy
extensions for the case when censoring indicators may be missing for a subset
of study subjects.
Shoubhik Mondal, PhD
Candidate in Statistics, Department of Mathematical Sciences, New Jersey
Institute of Technology, October, 24, 2013
|
Principal Trend
Analysis with Application to Time-course Genomic Data
We present principal trend
analysis, a new framework for computing a low-rank approximation for a group
of matrices with time-course structure. This low-rank approximation is a
generalization of the principal component analysis. While principal component
analysis does not consider the time-course structure of data, our penalized
decomposition approach yields a smooth representation of the principal
components, named principal trends. Moreover, the new decomposition can
result in sparse factor loadings, which facilitate the interpretation. The
method is demonstrated by simulations and on real time-course genomic data
sets.
Yuping Zhang,
PhD, Department of Biostatistics, Yale University, Nov, 14, 2013
|
Is there a needle in the haystack?
ART and non-standard asymptotics
This talk discusses marginal screening for detecting the presence of
significant predictors in high-dimensional regression (is there a needle in
the haystack?). Screening large numbers of predictors is a challenging
problem due to the non-standard limiting behavior of post-model-selected
estimators. There is a common misconception that the oracle property
for such estimators is a panacea, but the oracle property only holds away
from the null hypothesis of interest in marginal screening. To address this
difficulty, we propose an adaptive resampling test (ART). Our approach
provides an alternative to the popular (yet conservative) Bonferroni
method of controlling familywise error rates.
ART is adaptive in the sense that thresholding
is used to decide whether the centered percentile bootstrap applies, and
otherwise adapts to the non-standard asymptotics in
the tightest way possible. The talk is based on joint work with Min
Qian.
Ian
McKeague, PhD, Department of Biostatistics,
Columbia University, Nov, 21,2013
|
Copula-based modeling & Computational Solutions of
Warranty Cost Management Problems
Much recent research on
modeling and optimization of servicing costs for Non-Renewing Free
Replacement Warranties (NR-FRW) assumes that a consumers
usage pro_le is constant and known. Such an
assumption is unre-alistic for moderately high
value consumer durables. In such cases, it would be pragmatic to assume that
the manufacturer/seller is uncertain about any customers
_xed usage rate of the product; the usage rate is
modeled by a probability distribution of the usage for target customers. This
research seeks to model and minimize the expected costs of pragmatic
servicing strategies for NR- FRW warranties, using a Copula based approach to
capture the adverse impact of increasing product usage rate on its
time-to-failure. Since exact analytical solutions to these models are
typically not obtainable, numerical methods using MATLAB and the Simulated
Annealing algorithm for globally optimal cost minimization are used for
computational solution. These methods and results are compared with those
obtained from a well know benchmark numerical example and then new results
are derived.
Sonia Bandha, Department of Mathematical Sciences, New Jersey
Institute of Technology, Nov, 7, 2013
|
Reconstruct the RNA structurome from sequencing data
Next generation sequencing coupled with nuclease digestion
is emerging to dissect the structures of thousands of RNAs (RNA structurome) simultaneously. However, it remains
challenging to predict RNA secondary structures accurately. We present a
novel approach, SeqFold, for the reconstruction of
RNA structuome by statistical modeling of
sequencing data. Using the known structures of a wide range of mRNAs and
noncoding RNAs as benchmarks, we demonstrate that SeqFold
is more accurate and robust than traditional approaches. The reconstructed
landscape of RNA structurome shows widespread
impacts on genome-wide post-transcriptional regulation.
Zhengqin Ouyang, PhD., the Jackson Laboratory
for Genomic Medicine, University of Connecticut, Dec,5,2013
|
|