The impact of dependence is an important topic in large-scale multiple testing and has been extensively studied in the literature. However, the discussions have focused on the validity issue, and the important optimality issue is largely ignored. This talk considers multiple testing under dependence in a compound decision theoretic framework. For data generated from an underlying two-state hidden Markov model, we construct oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate (FNR) subject to a constraint the false discovery rate (FDR). Both theoretical properties and numerical performances of the proposed procedures are investigated. It is shown that the proposed procedures control the FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The results show that the power of tests can be substantially improved by adaptively exploiting the dependency structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Extensions beyond the HMM for set-wise inference and pattern identification, as well as applications in spatial data and time-course data analyses will be discussed if time permits.
Return to Biostatistics Working Group