On Some Issues of Optimality in Multiple Hypotheses Testing

Edsel Pena
Department of Statistics
University of South Carolina

4:00-5:00 pm
Thursday, November 6, 2008
315 Riddick, NCSU Campus

High-dimensional data, characterized by a large number (M) of variables or characteristics but with usually a smaller number (n) of replications or samples for each variable, arise in many areas, notably in the biological and medical areas. The increase in the number of such "large M, small n" data sets can be attributed to advances in high-throughput technology, notably microarray technology. This has led to the development of statistical methods appropriate for such data sets. A specific statistical problem that arise is that of multiple testing or multiple decision-making where for each variable there are two competing hypotheses or two competing actions, so the problem is to test simultaneously M pairs of hypotheses (or, equivalently, to choose among 2M possible actions) based on the high-dimensional data. In such multiple testing problems, there is a need to recognize the impact of multiplicity, and so the relevant "Type I Error" is usually defined in several ways, such as the family-wise error rate (FWER) or the false discovery rate (FDR). The goal is to use test/decision functions such that the chosen Type I Error rate is controlled, while also minimizing some measure of "Type II Error" or equivalently maximizing some measure of "power." Many existing procedures currently in use for controlling the FWER or the FDR rely on the set of p-values of the M individual tests, such as the Sidak procedure for FWER control or the popular Benjamini-Hochberg (BH) procedure for control of FDR. These current procedures, however, do not exploit the possibly differing powers of the individual tests. In this talk I will present some recent results in which the power of the multiple-testing procedure (for FWER- and FDR-control) is enhanced through exploitation of the individual powers of the M tests. This is joint work with J. Habiger and A. McLain.


Return to Biostatistics Working Group