presents
Mark Culp
FROM University of West Virginia
On The Semi-Supervised Joint Trained Elastic Net
Abstract
The elastic net (supervised enet henceforth) is a popular and
computationally efficient approach for performing the simultaneous tasks
of selecting variables, decorrelation, and shrinking the coefficient
vector in the linear regression setting. Semi-supervised regression,
currently unrelated to the supervised enet, uses data with missing
response values (unlabeled) along with labeled data to train the
estimator. In this talk, the joint trained elastic net is presented, which
incorporates the benefits of semi-supervised regression with the
supervised enet.
The supervised elastic net and other approaches like it rely on shrinking
the linear estimator in a way that simultaneously performs variable
selection and decorrelates the data. Both the variable selection and
decorrelation components of the supervised elastic net inherently rely on
the pairwise correlation structure in the feature data. In circumstances
in which the number of variables is high, the feature data is relatively
easy to obtain and the response is expensive to generate, it seems
reasonable that one would want to be able to use any existing unlabeled
observations to more accurately define these correlations. However, the
supervised elastic net is not able to incorporate this information and
focuses only on the information within the labeled data. In this talk, I
will demonstrate that the joint trained elastic net does indeed allow the
unlabeled data to influence the variable selection, decorrelation, and
shrinkage capabilities of the linear estimator. In addition, the impact
of unlabeled data on the risk and bias of the proposed estimator is
investigated with the merits for the proposed approach discussed. The
joint trained elastic net is demonstrated on a text mining application.
Friday, 30 March
3:00pm - 4:00pm
2203 SAS Hall