ST590G -- FALL 2011 Class Exercise #6 -- Thursday, 27 October 2011 Our Group ___________________________________________________ In the '/Class/ST590G' directory on the 'login' machine are several files of weekly weblogs from the Department of Statistics www4 web server. We will look at one file that has 50,000-60,000 records. You may want to debug using a smaller files: either 'start' with 100 records or 'w4000log' with 1000 records. An explanation of the fields is at the bottom. 1) Are the number of requests uniformly distributed throughout the day? Get the number of requests by hour of the day, or minute, or second (or 10 seconds). 2) Create a table of requests for each faculty id. Who is the most popular faculty member on the Web? 3) Subset either (1) or (2) for only requests that come from the neighborhood: 152.1.*.* NCSU fixed addresses 152.14.*.* wireless addresses 152.2.*.* UNC-Chapel Hill Brief explanation of 'Common Log Format' requestor - - [date/time timezone] "GET filerequest protocol" statuscode filesize Here are 3 records: 67.195.115.229 - - [18/Jul/2010:04:05:37 -0400] "GET /~dickey/SAScode/SUGI03/ HTTP/1.0" 200 1555 207.46.204.188 - - [18/Jul/2010:04:07:52 -0400] "GET /~weems/conferences0910.html HTTP/1.1" 200 3402 67.195.115.229 - - [18/Jul/2010:04:08:48 -0400] "GET /~davidian/st810a/ttest.R HTTP/1.0" 200 3150 so in the first case, 67.195.115.229 is the IP address of the requesting computer, -0400 is the timezone, HTTP/1.0 is the protocol (most are HTTP, a few FTP), 200 is the status code for a successful request, and 1555 is the size of the file. Other codes are 3xx for redirection, 4xx client error, 5xx server error. *** Note that in the directory structure, the character '~' may appear as '%7e' or '%7E' since its ascii code is '7e'.