ST 590G -- Computation for Data Analysis Second Assignment -- due Thursday, 01 October 2009 In the 'weblogs' directory are five files of weekly weblogs from the Department of Statistics www4 web server. Each file corresponds to a week and has 70,000-120,000 records. Choose one file among w4log0-w4log4 by taking your student id modulo 5. Since these files are so large, you may want to debug using a smaller file: w4000log with 1000 records. An explanation of the fields is at the bottom. 1) Find the IP addresses of the top 10 requestors (typically bots such as yahoo or google). 2) Create a table of the number of requests by time of day and day of week (24x7) (*or* divide day or week into fewer -- but still interesting -- pieces) for a) all successful requests b) requests by top 10 requestors from (1). 3) Create a table of requests for each faculty id. Who is the most popular faculty member on the Web? 4) Write a macro to read in the file and use the macro for one of the tasks above. FOR ALL EXERCISES: Hand in (1) Your program * With at least as many comments as you have toes! * (2) Your output (3) The answers to the questions asked. Brief explanation of 'Common Log Format' requestor - - [date/time timezone] "GET filerequest protocol" statuscode filesize Here are 3 records: 66.235.124.18 - - [30/Aug/2009:04:02:37 -0400] "GET /%7Eosborne/st512/handouts/hw10-key.pdf HTTP/1.1" 200 61616 74.6.18.231 - - [30/Aug/2009:04:04:50 -0400] "GET /~stefanski/NSF_Supported/Hidden_Images/orly_owl_files/orly_owl_Lin_4p_5_ramp.txt HTTP/1.0" 200 103751 65.55.207.118 - - [30/Aug/2009:04:06:58 -0400] "GET /~reif/Site/Project_files/final-project_data-README.pdf HTTP/1.0" 200 11836 so in the first case, 66.235.124.18 is the IP address of the requesting computer, -0400 is the timezone, HTTP/1.1 is the protocol (most are HTTP, a few FTP), 200 is the status code for a successful request, and 61616 is the size of the file. Other codes are 3xx for redirection, 4xx client error, 5xx server error. *** Note that in the directory structure, the character '~' may appear as '%7e' or '%7E' since its ascii code is '7e'.