Department of Statistics Seminar
North Carolina State University

presents

Daniella Witten

Department of Biostatistics  

FROM University of Washington

The cluster graphical lasso for improved estimation of Gaussian
 graphical models

Abstract

 We consider the task of  estimating a Gaussian graphical model in the  high-dimensional setting.  
The graphical lasso, which involves  maximizing the Gaussian log likelihood subject to an  $\ell_1$  penalty, is a well-studied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an $\ell_1$-penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network  via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering,  and then performing the graphical lasso on the subset of variables within each cluster.  We establish  model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set and a university webpage data set.

This is joint work with Kean Ming Tan and Ali Shojaie.

Friday, 22 February
3:00pm - 4:00pm
2203 SAS Hall

Refreshments will be served in the 5th floor commons at 2:30pm.
NOTE: No food or drink is allowed in any of the classrooms in SAS Hall.