SAS Tutorial

If you are new to SAS, you may want to practice writing some simple SAS programs. This tutorial is designed to help you take the first steps toward learning SAS.

You will generally follow these steps:



When you start SAS, you are in 'menu mode'. This tutorial is written as if you are in 'command-line mode'. To switch from having the pull-down menus at the top of the window to having a command line instead, use the CONTROL-A key sequence. Hold down the control key and press the letter a. See the command line come up?

Let's write a typical data step for a SAS program. Type the program below.

DATA ht_wt;
INPUT name $ sex $ age height weight; 
wtkilo=weight*.45;
CARDS;
ALFRED     M     14     69     112
BARBARA     F     13     62     102
JAMES     M     12     57     083
JANE     F     12     59     084
JOHN     M     12     59     099
JUDY     F     14     64     090
LOUISE     F     12     56     077
MARY     F     15     66     112
RONALD     M     15     67     133
WILLIAM     M     15     66     112
PROC PRINT;
RUN;

The DATA statement creates the SAS dataset ht_wt. We could use any name here as long as it is no longer than 8 characters and begins with a letter.

The INPUT statement creates variable names for each column of data in the dataset. The $ following the variables name and sex is necessary to read in their values: it tells SAS that the variable contains charater data (not numeric). Notice that every statement in the data step is followed by a semicolon.

The statement

wtkilo = weight*.45;

creates a new variable called wtkilo for the dataset by transforming each individual's weight from pounds to kilograms.

The CARDS statement tells SAS that you are finished with the instructions on how to create the dataset ht_wk and the very next line will be the start of the data itself. The keyword is named CARDS because years ago, data fed into a computer came on real cards with holes punched to represent different characters or numbers.

The data must correspond exactly with the ordering of the INPUT statement. SAS knows lines in your program are commands and not data by whether the line contains a semicolon: Your data lines should not have any semicolons.

The PROC PRINT statement marks the beginning of the PROC step. This particular procedure will print out the contents of the dataset in a nice format, with each variable having the name it was given in the INPUT statement.

The RUN statement is included as the last line of the program to tell SAS to execute every statement in your program. You must always end each SAS program with the RUN; statement.

To execute the program use the Control-E key sequence, or type the SAS command "submit" on the command line and press the RETURN key. (If you are using pulldown windows you may select "Submit" from the "Locals" menu.) There is an advantage in using the key sequence instead of the SAS command: When you use Control-E, your Log and Output windows are automatically cleared for you, so you don't have to remember to do it yourself before executing the program. You use a Control sequence by holding down the Control key and pressing a letter key.

You will see your cursor jump down to the bottom right corner of the screen-- when it returns to the command line, execution is finished. This will happen very fast--you may not even notice it.

Don't worry when you see your program vanish. SAS automatically clears this window when you execute a program. You will be able to retrieve the program.

Go to the Log window by placing the on-screen cusor in the Log window and clicking on the left mouse button. The Log window provides an explanation of what happened when SAS executed your program. It will give error messages or warnings for any mistakes which might be in the program.

If you have error messages shown in the Log window, you probably made a typing mistake. The error messages will show you the problem.

If you have errors, you can move back and forth between the Log and Program windows, noting the errors in the Log window and fixing them in the Program window.

Go to the Program window by placing the on-screen cusor in the Program window and clicking on the left mouse button.

Recall your program (remember SAS clears this window when you execute a program) with Control-R or type recall on the command line and press RETURN. If you are using pull down windows choose the "recall text" option in the "locals" menu.

Usually, the first error is the most important one. Sometimes SAS sees an error and decides not to execute any more of the program, but just checks for more errors. If that happens, you may see error messages around some of your program statements which really have no errors. The problems will usually clear up after you fix the first error.

Use the Control-E key sequence to execute your program.

Once the program executes with no errors, go to the Output window by placing the on-screen cusor in the Output window and clicking on the left mouse button.

The output window will contain the output your program requested. The results of the PROC PRINT statement were sent to the OUTPUT Window.

Return to the PROGRAM EDITOR window.

You can do further analyses on the dataset even though the original DATA step and data are no longer visible: they are still in the SAS's memory.

Let's do a few more procedures before we quit.

The PROC MEANS statement is often used to obtain descriptive statistics for your data. The VAR sub-statement specifies which variables you want analyzed.

Let's subdivide the dataset into classes (by sex) to compare the descriptive statistics between groups. To do this, we must sort the dataset by sex first:

To split a dataset by a variable, you have to sort it by the variable first.

Type in these statements to sort the data and calculate means for the two groups. You don't need to recall your previous statements: SAS remembers them.


PROC SORT;
	BY sex;
PROC MEANS;
	VAR height weight wtkilo;
	BY sex;
TITLE 'Calculation of Means for Males and Females';
RUN;
The SORT procedure sorts the dataset by the values for sex.

The MEANS statement tells SAS that you want means calculated for some of the variables.

You name the variables for which you want means in the VAR statement. Since you can only calculate a mean of a numeric variable, the VAR statement can contain only numeric variables.

The BY statement in the means procedure tells SAS to split analyses by the different values of the variable indicated in the BY statement (in this case, sex).

Make sure that variables in the BY statement of an any PROC classify the data into categories or groups. This is because the BY statement tells the procedure to run the analysis for each value of the BY variable. If this variable has fifty different values (as it might if it was a continuous variable, like height or weight for example), fifty different analyses will be run.

Execute the code with Control-E and check the Log window for errors or warnings. If all is well, go to the top of the Output window and look at your output.

The average height and weight for males is greater than the average weight and height for females. However, sometimes one number can be misleading in comparing the relationship between two groups.

Let's look at a comparison between the two groups by plotting height and weight by sex. This can be done by using the PROC PLOT statement.

Return to the Program window.

The following statements will produce a scatterplot with height on the vertical axis, weight on the horizontal axis, and the value of sex will be the plotting symbol:

PROC PLOT;
	PLOT height*weight=sex;
	TITLE 'Plot of Height vs. Weight by Sex';
RUN;
There is no need to recall the previous program statements: SAS still remembers the dataset ht_wt, so just type in these statements and execute. Check the Log window for errors and if there are none, go to the top of the Output window to see the plot.

Now go back to the Program window and type the following program statements to find out more about the relationship between height and weight. The output from these statements will compute the correlation between the two variables, as well as produce a simple statistical summary of the data:

PROC CORR;
	VAR weight height;
	TITLE 'Correlation for Height and Weight';
RUN;

To leave SAS, give the command

bye

on any SAS command line. If you are using pulldown menus, choose the "exit" option from the "file" menu.

The SAS windows should disappear and you should now see the UNIX ready prompt (%) in the xterm window. Go to the background window of the screen and hold down the right mouse button. The system menu will appear. Drag down to the logout option on the menu and release the mouse button. After a few seconds, the login window should reappear.