ST 590G -- Computation for Data Analysis Third Assignment -- due Thursday, 12 November 2009 (updated 02 Nov) Recall our examination of the Federalist Papers files at the sites http://www.foundingfathers.info/federalistpapers/fedxx.htm where xx takes values from 01 to 85. (Note there are two versions of 70: 'fed70a' and 'fed70b' -- choose one.) * or * (note change below) http://thomas.loc.gov/home/histdox/fed_xx.html where again xx takes values 01 to 85. (Note that this site has the second version of #70, as the file is 'fed_70-2.html') * or * (a third site) http://www.constitution.org/fed/federaxx.htm where xx takes values 01 to 85. * or * (a fourth site) http://www.let.rug.nl/usa/D/1776-1800/federalist/fedxx.htm where xx takes values 01 to 85. (Same two versions as with the 'FoundingFathers' site. Note the country of this host.) Find the word frequency of the words used by Mosteller & Wallace in 'mwwords.dat' for the observations that are needed for the following discriminant analysis. (Note: treat adverbial/adjectival or plural forms as the same word: "CONSIDERABLY"="CONSIDERABLE", "INNOVATIONS" = "INNOVATION", "VIGOROUS" = "VIGOR", "MATTERS" = "MATTER", "WORKS" = "WORK".) Among historians, the general agreement is that John Jay wrote 2,3,4,5, and 64; James Madison wrote 10, 14, 37, 38, ..., 48; no one is sure about 18, 19, 20, 49, 50, ... 58, 62, 63 and the others were written by Alexander Hamilton. Your task is to create two datasets for discriminant analysis. 1) Create one dataset with word frequencies for the papers with known authorship. Include here all of the (14) Madison papers, and at least as many Hamilton papers. (Use at least 14 of the Hamilton papers; use all of them if you want. You can choose to include the Jay papers or just delete them from the analysis.) Include a variable indicating the author. 2) Create a second dataset with the 15 papers where the authorship is not certain. Include a variable 'author' with missing values. ** If you would like to do your own discriminant analysis, my demo's are in 'tornado.sas' in the oct09 directory. **