
This course will provide a detailed treatment of
regression models and associated inferential methods both for
univariate and multivariate (e.g. repeated measures) response. The
first 1/2 to 2/3 of the course will focus on nonlinear regression
models for univariate response, including models for nonconstant
response variance. The remainder of the course will be devoted to
introduction to extension of the univariate model to two popular types
of nonlinear regression models for multivariate response: (i)
Population-averaged models and models for covariance structure will
discussed; methods for fitting these models are popularly known in the
literature as "generalized estimating equations" (GEEs), and (ii)
nonlinear mixed effects (subject-specific) models. Properties of
competing inferential techniques and the effects of model
misspecification will be studied via theoretical arguments carried out
at a nonrigorous, heuristic level and via simulation exercises on the
part of students. Although we will go through theoretical arguments
in class in some detail, and students will be expected to understand
and be able to carry out similar arguments at the same level, our main
objective will be to appreciate the implications of the results for
practice rather than the technical details. Implementation of the
methods and application to data will be emphasized in the homework
assignments.
(Return to top)
Course prerequisites
ST 512R,
Experimental Statistics for Biological Sciences II;
ST 552,
Linear Models and Variance Components; and familiarity with
SAS or R/Splus and a scientific computing language (e.g. MATLAB,
FORTRAN, C++, SAS IML, etc). Students should have a strong
background in probability and inference at the level of
ST 521
and
ST 522
(the prerequisites for ST 552).
(Return to top)
Course topics
- Introduction and motivation
- Models for univariate response
- Introduction to nonlinear models
- Implementation of generalized least squares (GLS),
iteratively reweighted least squares
- Generalized (non)linear models, quasilikelihood
- Normal theory maximum likelihood (ML)
- Unknown parameters in the variance function
- Detecting and modeling nonconstant variance
- Large sample theory - a casual approach
- The "folklore" theorem and "optimality" of GLS
- Linear vs. quadratic estimating equations for the regression
parameter
- Effect of estimating weights in GLS
- Estimation of unknown parameters in variance function models
- Models for multivariate
response
- Modeling multivariate response - sources of correlation
and "subject-specific" vs. "population-averaged" approaches
- Generalized estimating equation methods for
population-averaged models
- Nonlinear and generalized linear mixed effects
(subject-specific) models - approximate and "exact" methods
See the class notes below for more detailed information
(Return to top ).
Syllabus
(
Return to top)
Teaching Assistant
- Ying (Jessie) Zhang
Office hours (in 9 Patterson) F 10:00 - 11:00 am
(
Return to top)
Class notes

Class notes in pdf format
(
Return to top )
Homework assignments and tentative due dates
(Return to top )
Homework solutions
- Homework
1 Solutions, plot of the
data, GLS algorithm program in R
and output, and GLS algorithm program in R and output.
Homework
1 Extra Problems Solutions, and program (in R) and output for Problem 6.
- Homework 2 Solutions. Problem 1: plot of the data, GLS algorithm program in R and output; and Problem 2: IRWLS program in R and output,
GLS algorithm program in R and output.
Homework
2 Extra Problems Solutions.
- Homework
3 Solutions. Problem 1: (a) plot of the data, (b) GLS-PL algorithm program in R and output, (c) GLS-identity algorithm program in R and
output, (d) GLS-log algorithm program in R and output, (e) normal theory ML program in R and output; and Problem 2: program in R and output, OLS residual plots, and GLS residual plots.
Homework
3 Extra Problems Solutions.
- Homework
4 Solutions.
Homework
4 Extra Problems Solutions.
- Homework
5 Solutions. Problem 2: (d) SAS proc genmod program and
output, and
(e) SAS proc nlinmix program and
output
- Homework
6 Extra Problems Solutions. Problem 3: (a) SAS proc nlinmix program and
output, (b) R function nlme program and
output, and
(c) SAS proc nlmixed program and
output
(
Return to top )
Data analysis project
- In the data analysis project, you will apply
the methods covered in class to a scientific challenge posed by
subject-matter investigators and prepare a final report for the
investigators. Here is the data set. Due
Friday, October 26 at 3:00 pm.
(
Return to top )
Test
- The test has been graded -- you can pick yours up from Sasha's
desk in 220 Patterson. Here are the solutions.
(
Return to top )
Final project
- You have been assigned a paper from the statistical literature
related to the content of the course. (paper assignments and more information.)
- Final project persentations will take place on FRIDAY,
DECEMBER 14, 9 am - noon in Patterson 208.
- Five-page summary papers are also due on FRIDAY, DECEMBER
14. These should be TYPED in DOUBLE-SPACED format.
(
Return to top )
SAS and R examples (in class notes)
- Section 3.7, Program 3.1. IRWLS with theta known, SAS
program and output .
- Section 3.7, Program 3.2 . GLS algorithm with theta known, SAS
program and output .
- Section 3.7, Program 3.3. GLS algorithm with theta known, R program and output .
- Section 6.8, Program 6.1. GLS algorithm with theta unknown and
estimated, SAS program
and output using PL
(quadratic estimating equation).
- Section 6.8, Program 6.2. GLS algorithm with theta unknown and
estimated, R program
and output using PL
(quadratic estimating equation).
- Section 14.7, Program 14.1. Fitting multivariate data using GEE
methods: linear estimating equation for beta and simple moment methods
for correlation parameters using SAS PROC GENMOD. Demonstrated on the
epileptic seizure data of Thall and Vail (1990). Data set , program , and output .
- Section 14.7, Program 14.2. Fitting multivariate data using GEE
methods: linear estimating equation for beta and simple moment methods
for correlation parameters using R function gee(). Program , output ,
and help file
for gee().
- Section 14.7, Program 14.3. Fitting multivariate data using GEE
methods: linear estimating equation for beta and quadratic estimating
equation for correlation parameters using SAS macro NLINMIX. There
are two versions of the macro. Here we use the most recent version as in the class
notes;
program , log
file , and
output .
- Section 15.6, Program 15.1. Fitting nonlinear mixed
effects models using a two-stage approach with the EM algorithm
for stage 2 using R
data set and program and output.
- Section 15.6, Program 15.2. Fitting nonlinear mixed effects
models using a two-stage approach with mixed model software for stage
2, program to create "data"
, program to fit stage
2 using proc mixed , and output.
- Section 15.6, Program 15.3. Fitting nonlinear mixed effects
models using the first-order linearization method with linear
estimating equations using SAS macro nlinmix for version 8.0 and
above. There are two versions of the macro. Using the most recent version as in the
notes: program ,
log file , and output . Using the
older version: program , log file , and
output .
- Section 15.6, Program 15.4. Fitting nonlinear mixed
effects models using the refined linear approximation about
empirical Bayes estimates of the random effects using
SAS macro nlinmix (most recent version),
log file , and
output .
- Section 15.6, Program 15.5. Fitting nonlinear mixed
effects models using the refined linear approximation about
empirical Bayes estimates of the random effects using R nlme(),
program and output .
- Section 15.6, Program 15.6. Fitting nonlinear mixed
effects models using the "exact" likelihood method with
integration carried out via adaptive Gaussian quadrature using
SAS proc nlmixed,
program and output .
- Section 15.6, Program 15.7. Fitting generalized linear mixed
models using PQL with SAS macro glimmix, program , log file , and output ; and glimmix macro for version 8.0 and above of SAS.
- Section 15.6, Program 15.8. Fitting generalized linear mixed
effects models using the "exact" likelihood method with
integration carried out via adaptive Gaussian quadrature using
SAS proc nlmixed,
program and output .
(
Return to top )
Errata list
The errata list will be updated as we find typos!
Announcements (most recent shown first)
- Remember, final project presentations will take place on FRIDAY,
DECEMBER 14, 9:00 am - 12:00 noon in 208 PT. Please email your
presentation to me by 8:30 am on that day so that I can load all
presentations onto the computer in 208. Your final summary paper is
also due on this day.
- There will be NO CLASS on Thursday, December 6 -- we are DONE!
- There was a typo in the statement of the model in Problem
1 of Homework 5; see above.
- We WILL NOT have class on Thursday, November 29.
- There WILL be class on Thursday, November 8.
- In the description of the model for the data analysis project,
the interpretation of tau* as described by the investigators is not
exactly correct! In the third bullet on page 3, the question being
asked by the investigators thus needs to be reformulated -- the
question they have CANNOT be addressed by simply comparing tau and
tau*!! In your report, you will need to explain to the investigators
what is wrong with the interpretation they have given you, and do a
comparison that you think makes better sense to address their
questions in the third and fourth bullets.
- The due date for the data analysis project has been EXTENDED
to FRIDAY, OCTOBER 26, AT 3:00 PM. Please turn in yours to me or
to Sasha, or put it in my mailbox.
- There will be NO CLASS on Tuesday, September 18, 2007.
(
Return to top )