ST590G -- FALL 2010 Class Exercise #1 -- due Thursday, 26 August 2010 Our Group ___________________________________________________ The files are in 'aug10' directory. In the file 'NCinfmort.txt' are county-level summaries of the infant mortality for 2004-2008. This file is tab-delimited and was derived from the website http://www.schs.state.nc.us/SCHS/ data/databook/CD12C%20inf%20death%20rates.html We are interested here only in the county level data, so you may just want to use FIRSTOBS=6. a) Read in the data in this file and create a dataset with the relevant variables. In the file 'nc2010econ.dat' are selected economic, educational, and demographic data for North Carolina at the county level. The list of variables is given in the file NC2010econ.variables. (Data courtesy of Tammy J Lester, NC Dept of Commerce.) (yes, this is the same as the last assignment for ST445) b) Read in these related variables and create a dataset c) Merge these two datasets by county name. d) The main task of the assignment is to look for a linear relationship between infant mortality and one of the economic variables. Note: 1) Be aware that many counties in North Carolina are quite small, both in area and population, and so that analysis of count data from these counties requires care. My suggestion is to limit your analysis to the largest counties, and avoid the smallest. 2) You should be aware that the largest counties will have the largest counts, the smallest counties the smallest -- of anything -- regardless of any cause/effect. For example, Mecklenberg and Wake counties have the highest numbers of doctors and the highest incidence of cancer while Camden and Clay counties have small numbers of both. But does the apparent positive correlation here mean that we should try to reduce the number of doctors to reduce cancer incidence? Base any relationship on RATES: number/population.