Suppose someone notices that in his sample of 30 supposedy normal data points, the highest point is 0.5 standard deviations above the next highest. That gap, he claims, is unusual. Now we can be sure that the probability that the highest two in a normal sample of 30 will differ by more than 0.5 standard deviations is not a function of how big the true standard deviation is, nor is it a function of the mean. Thus we can study just N(0,1) data to get the distribution of this gap. I am interested in the mean gap (in units of standard deviation) and the 5th and 95th percentile (after all , the top two could also possibly be too close together).
Simulate 10,000 of these gaps and use PROC UNIVARIATE to get the percentiles. Make a note of the mean and the 5th and 95th percentiles (highlight or circle them for example). As you run your simulations, make a variable X that is 0 if the gap is less than 0.5 and 1 otherwise. Compute the mean of this variable using PROC MEANS. Why are we interested in that mean? Note that these are Bernoulli trials (0,1). The mean and variance of such trials are related, as we learn in stat classes. What is the variance implied by the mean of X and what is the variance estimated by PROC MEANS? (any non-stat majors can skip that last part)
If you get stuck on this, the program IML3.SAS will be of help. Please try it on your own first. If you use a modification of IML3, then at least explain that portion of the program that uses the rank[ ] function. You will need to change a couple of other features as well. This will be our last assignment.