## Course objective: To introduce students to statistical models and methods for the analysis of longitudinal data, i.e. data collected repeatedly on individuals (humans, animals, plants, samples, etc) over time (or other conditions). (Return to top)

## Course prerequisites ST 512, Experimental Statistics for Biological Sciences II, or equivalent. Thus, students should be familiar with basic notions of probability, random variables, and statistical inference; analysis of variance; and (multiple) linear regression. Familiarity with matrix algebra is also useful. We will review matrix algebra at the beginning of the course and make considerable use of matrix notation and operations throughout. ST 512 involves the use of the SAS (Statistical Analysis System) software package; thus, students are expected to have had some exposure to the use of SAS. The course is meant to be accessible both to non-majors and majors. The underlying mathematical theory will not be stressed, and the main focus will be on concepts and applications. Please see the instructor if you have questions about the suitability of your background. (Return to top)

## Course topics • Preliminaries: Introduction, Review of matrix algebra, random vectors, multivariate normal distribution, review of linear regression
• Classical methods for normally distributed, balanced repeated measurements: Univariate repeated measures analysis of variance, Multivariate repeated measures analysis of variance, Drawbacks and limitations of classical methods
• Methods for normally distributed, unbalanced repeated measurements: General linear models and models for correlation, Random coefficient models, Linear mixed effects models, Population-averaged vs. Subject-specific modeling
• Methods for non-normally distributed, unbalanced data: Probability models for discrete and continuous nonnormal data and generalized linear models, Generalized estimating equations for population-averaged models
• Advanced topics (quick overview): Generalized linear mixed effects models, Nonlinear mixed effects models, Missing data mechanisms

See the class notes below for more detailed information (Return to top ).

#### Tentative syllabus #### Teaching Assistant • Lihua Tang, Office hours M 11:30 am - 12:30 pm, 9 Patterson Hall, ltang@ncsu.edu

#### Class notes Class notes (in pdf format) will be available here in March 2007; if you are taking this class in Spring 2007, you will need to purchase the notes at the NCSU Bookstores.

#### Homework assignments and tentative due dates #### Homework solutions #### Tests • Test 1 took place Tuesday, March 20 and covered Chapters 1 to 8 of the notes (through Homework 3 -- Chapter 6 not included). Here are the solutions, a histogram of the grades (scores are out of 100), and summary statistics: Mean = 84.5, Median=86, Standard deviation = 12.6, n=27.

• Test 2 took place Thursday, May 3 and covered everything since Test 1. Here are the solutions, a histogram of the grades (scores are out of 100), and summary statistics: Mean = 90.9, Median = 92.5, Standard deviation = 7.2, n=26.

#### Data analysis project ## SAS on-line documentation #### SAS and R examples (in class notes) • Plotting using R The R language is an open source computing environment for statistics and graphics that is available here at NCSU. (R is an open source version of Splus.) Although we use SAS in this course to carry out analyses (which can also be done in R, by the way), the instructor vastly prefers R for making graphics. So all of the plots in the course notes and in homework solutions are created with R. Here is a sample R program that reads in the dental study data introduced in Chapter 1 of the notes and creates two plots (output to pdf files): two-panel spaghetti plot of the girl and boy data separately with sample mean profiles superimposed, and spaghetti plot of the girl and boy data together using the gender indicator as the plotting symbol, as in Figure 1 of Chapter 1.
• Dental data The dental study data in Example 1 of Chapter 1, which we analyze repeatedly in later chapters for illustration, are world-famous and used by many authors discussing longitudinal data methods. Here is a picture of the pterygomaxillary fissure.
• Chapter 4, Example 1 (computation of sample mean vectors, sample covariance and correlation matrices, pooled covariance and correlation matrices, data for scatterplot matrices, lag plots, and autocorrelation functions using PROCs MEANS, CORR, DISCRIM, GLM, and MIXED for the dental data): program , output , and data set. The program outputs the centered/scaled distances in the file dentcenter.dat, which may be read into the R program dentscatter.R to obtain the scatter plots for girls and boys. A SAS program shows how to call PROC INSIGHT to make the scatter plots for girls and boys.
• Chapter 5, Example 1 (analysis of dental data by univariate repeated measures analysis of variance, PROC GLM): program , output , and data set
• Chapter 5, Example 2 (analysis of guinea pig diet data by univariate repeated measures analysis of variance, PROC GLM): program , output , and data set
• Chapter 6, Example 1 (analysis of dental data by multivariate repeated measures analysis of variance, PROC GLM): program , output , and data set
• Chapter 6, Example 2 (analysis of guinea pig diet data by multivariate repeated measures analysis of variance, PROC GLM): program , output , and data set
• Chapter 8, Example 1 (analysis of dental data using general linear population averaged regression model, PROC MIXED): program , output , and data set
• Chapter 8, Example 2 (analysis of ultrafiltration data using general linear population averaged regression model, PROC MIXED): program , output , and data set
• Chapter 8, Example 3 (analysis of hip replacement data using general linear population averaged regression model, PROC MIXED): program , output , and data set
• Chapter 9, Example 1 (analysis of dental data using a random coefficient model, PROC MIXED): program , output , and data set
• Chapter 9, Example 2 (analysis of ultrafiltration data using a random coefficient model, PROC MIXED): program , output , and data set
• Chapter 10, Example 1 (analysis of dental data using linear mixed effects model, PROC MIXED): program , output , and data set
• Chapter 10, Example 2 (analysis of weight-lifting data using a linear mixed effects model, PROC MIXED): program , output , and data set
• Chapter 11, Example 1 (analysis of horsekick data using a generalized linear model, PROC GENMOD): program , output , and data set
• Chapter 11, Example 2 (analysis of myocardial infarction data using a generalized linear model, PROC GENMOD): program , output , and data set
• Chapter 11, Example 3 (analysis of clotting time data using a generalized linear model, PROC GENMOD): program , output , and data set
• Chapter 12, Example 1 (analysis of epileptic seizure data using a population-averaged model and GEE, PROC GENMOD): program , output , and data set
• Chapter 12, Example 2 (analysis of wheezing data using a population-averaged model and GEE, PROC GENMOD): program , output , and data set

#### Errata list The errata list will be updated as we find typos!

#### Announcements (most recent shown first) • GRADED TEST 2 available in 220 Patterson Hall on Sasha's desk. Grades should be available on-line Tuesday am, May 8.

• TEST 2 will be held in 208 PATTERSON HALL (NOT the classroom) during the scheduled exam period on Thursday, May 3, 8:00 - 11:00 am. Test 2 will cover Chapters 9, 10, 11, and 12. As with Test 1, you will be allowed ONE 8.5 x 11 inch sheet of HANDWRITTEN notes, ONE SIDE OF THE SHEET ONLY.

• The data set links for Homework 5 were previously INCORRECT and have been changed (as of 4/5/07).

• Our TA, Lihua Tang, is taking her prelminary oral exam on Monday, 3/26. Hence, she will not hold office hours at her usual time this week; instead, she will hold office hours on Thursday 3/29 from 12:00 noon to 1:00 pm.

• There is a TYPO at the bottom of page 305 of the notes. The displayed model statement should be:
```
model ufr = c1 c2 c3 tmp c1*tmp c2*tmp c3*tmp / solution;
```
• TEST 1 will take place TUESDAY, MARCH 20, in 208 PATTERSON HALL, from 6:00 - 8:00 pm. You are allowed ONE 8.5 x 11 inch sheet of HANDWRITTEN notes, ONE SIDE OF THE SHEET ONLY.

The test will cover Chapters 1-8, except for Chapter 6, which we did not cover.

• We WILL have class at the usual time on TUESDAY, MARCH 20. I will have my usual office hours from 10 - 11 after class.

• Graded Homework 3 will be available after 9:30 am in 220 Patterson Hall. The solutions are posted above.

• Our TA, Lihua Tang, will also be out of town next week, hence, she will not be able to hold her regular office hours on Monday, March 12, 11:30 - 12:30. She has asked me to let you know this and that she would be happy to take questions by email.

• Because of the change in the test date, I have changed the due date for Homework 3 to be Tuesday, March 13 (after spring break). There is no class this day, but YOUR HOMEWORK MUST BE TURNED IN BY NOON ON THE 13TH TO SASHA MIAO IN 220 PATTERSON HALL!!! No late homework accepted for reasons other than dire emergencies.

• There will be NO CLASS on Tuesday, March 13 (Tuesday after spring break).

• Date of Test 1. As we discussed in class on February 22, the date of the test is now set as Tuesday, March 20 -- details will be given in class.

• Typo in Homework 3, Problem 3. In Problem 3, the coding of the treatment indicator is noted incorrectly. It should say (at the bottom of page 4):

Treatment indicator (=1 if placebo, =2 if low dose, =3 if high dose)

• Typo in Homework 2, Problem 1(e). In Problem 1(e), is should say "q=4 groups" (not "times").

• Typo in Homework 2, Problem 2(g). In Problem 2(g), it should read "Returning to the issue in (d)..." (that is, it should refer to (d) instead of (c)).

• Forgotten Homework 2: Hard copies of Homework 2 were supposed to be handed out on Tuesday, February 6, but were not. These will be handed out on Thursday, February 8. In the meantime, you may download a copy above.

• SAS procedure for standardizing variables: There is a procedure called PROC STDIZE that will automatically standardize variables. See the updated version of the program for Example 1 in Chapter 4 for demonstration of its use.

• Typos in Homework 1, problem 7(a): In this problem, time is in units of "hours." In part (a)(ii), "day" should be "time (hours)." In part (a)(iv), "month" should be "time (hours)."

• Homework 1: Several of you have asked about fitting model (3) in problem 5(e) of the homework. At the bottom of page 3, you are asked to write a SAS program that reads in the data and then fits both of models (2) and (3). Here are some comments:

(i) When you read the data in from the file, given you are fitting a regression model in the ACTUAL times tj , you should not be transforming the times from the age scale (8,10,12,14) to (1,2,3,4) as in the example code at the end of the chapter -- otherwise, you will not be fitting the correct models. (ii) When you fit model (3), the problem says to "call proc glm as above, but now use instead the model statement"

model distance = gender age gender*age / solution; This means that you should KEEP the class statement you used to fit model (2), and just replace the model statement for model (2) with this one. If you also eliminate the class statement, you will not be fitting model (3). The point here is that the way SAS procedures parameterize models by default when a class statement is used to set up indicator ("dummy") variables for a covariate that takes on "categorical" values (like 0 and 1) may not be as you expect. (There are ways to override the default; if you are interested, see the SAS documentation for proc glm).

• Chapter 4, Example 1: There is an alternative and much easier way to transform data that are stored in the format of one data record (line) per observation to that of one data record (line) per individual:
```
data dent1; infile 'dental.dat';
input obsno child age distance gender;
run;

proc transpose data=dent1 out=dent2 prefix=age;
by gender child notsorted;
var distance;
run;
```
Thanks to Laine Elliott for sending code to show that proc transpose can do this! The Example 1, Chapter 4 programs have been updated to include this code.

Chapter 4, Example 1: Other ways to get the pooled sample covariance and pooled sample correlation matrices: I deliberately lied to you yesterday when I said that the only way to get these matrices in SAS is using proc discrim. It is also possible to get them using proc mixed (which we will introduce later in the course) and (almost) using proc glm with the manova option. The program for Example 1, Chapter 4 has been updated to add code to show how this may be accomplished.

• The due date for Homework 1 has been CHANGED to Tuesday, February 6. This change is reflected above.