Variable
selection is an essential component of modern data analysis. Starting
with a large number of variables, possibly larger than the number of
observations, the aim is to determine a smaller subset that exhibits
the strongest effects. Variable selection has been studied extensively
in the literature. These classical methods treat the predictor
variables "flatly," supposing all subsets of variable to be equally
suitable for use in a multivariate model. However, in most science and
engineering applications, measurements are structured in one or more
ways. Incorporating such structure information into the modeling
procedure poses interesting and challenging questions.
In this talk, I plan to
consider the following two types of model structure: 1. Heredity
structure. Interpreting regression models with
interaction terms often
requires that the corresponding main effects also be considered.
Through the use of such heredity constraints, we develop new variable
selection methods for fitting a predictive model while simultaneously
identifying important interaction terms. Such techniques are likely to
be important in, for example, the study of complex diseases, such as
cancer, which involves multiple genetic and environmental risk factors,
with scientists particularly interested in their interactions. 2.
Grouping structure. In many engineering and scientific applications,
input variables are grouped, for example, in biological
applications where assayed genes or proteins can be grouped by
biological role. Common statistical analysis methods such as ANOVA,
factor analysis, and functional modeling with partially ordered basis
sets also exhibit natural variable groupings. We develop variable
selection techniques while respecting group
constraints. Our new methods enjoy benefits that existing successful
methods do not have, while offering the potential for achieving a
theoretical "oracle" property.
Friday, March 30, 2007
3:35 - 4:35 pm
206 Cox
Hall
Refreshments will be served on the second floor
of Dabney Hall (left of Room 222) at 3:00 pm.