Choropleth Maps with ArcGIS
Time for completion: ~4 hour

Outline

1 - Make the data
 1.1 - Edit the textfile
 1.2 - Make a dbf
2 - Edit the states shapefile
3 - Join Crime to States
4 - Get the Election data
5 - Project the data
6 - Calculated Point Biserial Correlation (*optional)
 6.1 - Get the proportion of values
 6.2 - Get means for D and R
 6.3 - Get standard deviation
 6.4 - Do the math
 6.5 - Ecological fallacy
7 - Make the map layout
8 - Symbolize with ColorBrewer
9 - Classify, symbolize and copy (revised)
10 - Make the maps
 10.1 - ArcMap is not graphics software
11 - Overlay election results
12 - Optional Map(s)

Purpose

The general purpose of this exercise is to make maps of 2003 crime vs. 2004 election results for the Conterminous US. The crime data are standardized as a rate (#/100,000) so some of the enumeration unit problems that stem from the arbitrary division of space imposed by state boundaries have been ameliorated. Data standardization and the enumeration unit problem are addressed in Slocum 2005, Ch. 13 Choropleth Mapping. You will be classifying each of the four types of crime (Violence, Murder, Rape, and Robbery), and symbolizing each with a color scheme. You will use ColorBrewer to look at different color schemes.

One observation I would like you to make is how each classification type changes the overall appearance of the map, those who look at your map will look for the relationship between election results and crime but the visual impression of this changes depending on the classification. A further issue to contemplate is scale; How would the maps look with county level data? Would the same relations at the state level translate to counties around major cities? Although it would be nice to have the data necessary to address this, county level crime statistics for the conterminous United States do not exist in a tangible form.

Step 1 - Make the Crime data

First create a folder in C:\WorkSpace called "choropleth_maps" and copy the States shapefile from
C:\Program Files\ArcGIS\Bin\TemplateData\USA to that folder.

Try this link first [ Crime by State ], if that link doesn't work go here [ http://www.infoplease.com/ ], type in "crime by state" at the top and click the Search button. You are looking for a table called "Crime Rates by State, 2004".

Once you get to the page, copy all the values, states and associated rates. Copy just the values in the table, click and hold at bottom right most cell, drag up to the top left, right-click Copy. Start Excel.

Step 1.1 - Get the data in a spreadsheet

Paste the copied text into the empty spreadsheet, to do this right-click choose Paste Special. Choose Text from the Paste Special dialog box.

Remove the top four rows. Select the first row 1, hold down Shift, and select the other rows, right-click and choose Delete.

Insert a row at the top of the table. To do this right-click on row 1, choose Insert.

Add six column lables in the inserted row (STATE_NAME, VIOLENT, MURDER, RAPE, ROBBERY, PROPERTY).

Un-abbreviate the names of the states. Rename each of the 52 states listed in the Excel file with their proper names. Knowing state abbreviations might help a little, but the people who made this table did not always follow them.
Note: Spelling counts, make sure you have all the state names correct.

Also, do not include Washington DC. It will not show-up on the maps you will be making and it is a statistical outlier.
( http://en.wikipedia.org/wiki/Crime_in_Washington,_D.C. )

Widen the columns by double clicking on the margin between the columns and Excel will automatically widen the column to fit the maximum width of the values contained.

Excel is generally useful software, but there are some things that are problematic. One of these things is how it deals with dBase (.dbf)

Step 1.2 - Make a dbf

Select all the columns containing data, go up to File -> Save As. Choose DBF 4 (dBASE IV) from the Save as type: menu. Save it as crime_by_state_<year>.dbf (<year> is the year of the data you copied)

Warning: Excel does not like to let go of files that are not in it's own format (.xls), you might have tell it a few times to save/close the spreadsheet.

Note: In the text and screenshots shown below you will see "_2003", please disregard this.

Close Excel, the spreadsheet cannot be open in Excel and in ArcMap.

Step 2 - Edit the shapefile

Start ArcMap and add the states shapefile and crime_by_state_2003.dbf from the choropleth_maps folder. Get rid of Alaska and Hawaii.

To do this load the Editor Toolbar (in ArcMap go up to View -> Toolbars -> Editor). Click on the the Editor pulldown menu, go to Start Editing. Get the Edit Tool, select Alaska and Hawaii, delete them, and then do Editor -> Save-Stop editing.

Step 3 - Join crime to states

Right-click on the states and go to Joins and Relates -> Join. The input name (in the state shapefile's dbf) is STATE_NAME, the table is crime_by_state_2003.dbf, the name it contains that will be joined is STATE_NAME. Click the Advanced button and select Keep only matching records, click OK, then OK again, and select No when asked if you want to build and index.

The results of the join will be displayed in ArcMap. If there is an error in the dbf those polygons will not show up. These polygons did not have any matching records, so you can see the polygon beneath it. In the case of the screenshot below I have misspelled South Dakota and Nevada in the dbf so these polygons did not join.

To fix this open crime_by_state_2003.dbf in Excel and fix the name(s), perhaps the column was not wide enough when you saved it? Or the name is misspelled. To redo the join you have to first remove it, right-click on the states and go to Joins and Relates -> Remove Join(s) and choose the dbf to removed.

After you've fixed the error then remake the join, right-click on the states and go to Joins and Relates -> Join and put in the information as you did before. Check to make sure it worked by clicking with the Identify tool and check that the attributes for the fields are linked to the polygons correctly.

Once you're satisfied right-click on states_Layer and do Data -> Export Data (this will permanently combine the data joined). Save the shapefile to the choropleth_maps folder and call it States_Crime03.shp

Step 4 - Get the Election data, join it to States_Crime03

Save [ this file ] to the choropleth_maps folder and unzip it. This is a zipped shapefile that has the election data in its attribute table along with 2000 Census statistics.

Load the extracted shapefile, Election_04, in the ArcMap layout containing States_Crime03. Create a join that combines the attributes of States_Crime03 and Election_04. Right-click on the Election_04 layer, choose Join.

In the Join Data dialog specify the attribute name to join by (STATE_NAME). Click OK. Click No when asked if you want to build an index.

Verify that the join worked by using the Identify tool to click on a state.

Step 5 - Project the data frame and export the combined dataset

Go up to View -> Data Frame Properties, under Predefined choose Projected Coordinate Systems -> Continental -> North America, and choose USA Contiguous Equidistant Conic. Click OK. There is no reason for this, we are not calculating are area or distance, this for the sake of appearance (the US-Canada border is not a straight line on a globe).

After you've projected the data frame, right-click on Election_04 (to which you have joined the attribute table of States_Crimes03), and do Data -> Export Data. Set it to Use the data frame's coordinate system, this will preserve the projected coordinate system permanently, and name the output "Election04_Crime03_equ.shp"

Click no when asked if you want to add it to ArcMap and create a new empty layout

Step 6 (optional) - Calculate the Point Biserial Correlation

The empty data frame containing the final version of Election04_Crime03_equ we will be using to extract some statistics. The indicator variable is D or R (ELECTVTE in the Election04_Crime03_equ shapefile). Crime has been normalized by state population as a rate.

This is the equation we will be using, but first we need to collect all the variables.

Step 6.1 - Get the proportion of values

Get some paper and pen onto which to record the values you will be collecting in the proceeding steps. Following the equation above we need to calculate the proportion of results that were D (the proportion of polygons with ELECTVTE = D), and the proportion that are R. To do this go up to Selection -> Select by Attribute and make a query for D. Click Apply and then Close.

With all the polygons selected right-click on Election04_Crime03_equ and open the Attribute Table. At the bottom of the Attribute Table next to Records ArcGIS tells you, in brackets, how many records were selected. This is the proportion of the polygons in Election04_Crime03_equ that have the label D (the rest are R).

So for the P variable:

= 18/48

= (48-18)/48

Step 6.2 - Get means for D and R

With all the D records selected, scroll over until you see the first of the crime fields. Right-click on the column heading and choose Statistics.

The summary statistics are only for the D records, the records that are selected. You can also write down the other crime rate means for the D records by using the pulldown menu next to Field (indicated in the screenshot below).

Close that summary statistics window when you have finished writing down the Mean for each of the crime rates associated with the D polygons. Go up to Selection -> Clear Selected Features, then make a new query to select all the R polygons. Repeat what you just did to get the means for the R polygons.

Step 6.3 - Get standard deviation for each of the crime rate fields

Keep the Attribute Table open but unselect everything. Highlight VIOLENCE field and get the standard deviation for all the records by right-clicking on the column heading and choosing Statistics.

For the calculation we need the Standard Deviation of each of the crime fields. Record the standard deviation for VIOLENCE and then highlight the next fields and record each of their standard deviations.

Step 6.4 - Do the math

We have what we need to calculate the point biserial correlation for violent crime. What follows is an example of how to do this for D states. Those who are comfortable with Excel (or something better) feel free to use a spreadsheet to do the calculations. For those who work best with pencil and paper that is just as good.

With the values for mean of Violent Crime (D), for mean of Violent Crime (R), the Standard Deviation for Violent Crime, and the Proportion of values for D, calculate the statistic.

So what does this number mean? It ranges from -1 to +1. The result is negative, this implies that low violent crime rates are associated with Democrat, high crime rate is associated with Republican. The correlation level is mild (depending on the expectation). Note: we assume that there exists a linear relationship between two variables.

Calculate the Point Biserial Correlation for each crime rate.

More about the Point Biserial Correlation [ http://www.cmh.edu/stats/definitions/biserial.htm ]

Step 6.5 - Beware of the ecological fallacy

Dr. Sweeney admonished me when discussing this exercise, warn your students about Ecological Fallacy. So here is the warning: Although you may find a statistical indicator that supports a particular prejudice it is important to remember that the data you are working with are aggregated significantly (see: "Enumeration Unit Problem" covered in Slocum 2005 Ch 13, Choropleth Mapping). When discussing conclusions regarding these data remember that the results are only relevant to a particular state as a whole, or to the Conterminous US, not to individual residents of those states. This is a prevalent Fallacy of Logic and all too often maps are used in support of these types of incorrect arguments. One can however say that there appears to be a mild statistical relationship between states that voted Democrat having a lower rate of violent crime ... not that Republicans are criminals.

Now lets create some maps of these data and explore how we can represent the variables.

Step 7 - Make map layout for crime

To the new layout add the shapefile you just created, Election04_Crime03_equ.shp. Switch to View -> Layout View. Go up to File -> Page and Print Setup... and change the Orientation to Landscape. Add some guidelines, leave a half inch margin around the edge and add two more guidelines dividing the page into quarters. Position the one data frame you have into the upper left quadrant. Rename Layer to "Violent Crime (per 100k)".

With the data frame selected, right-click on it and choose Copy.

Paste the copied data frame 3 times and fit them into the three empty quadrants. Each data frame contains the same projected dataset.

Rename each of the copy-pasted layers, "Murder (per 100k)", "Rape (per 100k)", and "Robbery (per 100k)". Expand the Violent Crime (per 100k) layer and rename the data link (Election04_Crime03_equ) with a "1" on the end. The reason for this will become clear in a minute.

Now that you have ArcMap with four data frames aligned with each other, Save the map layout as "election04_vs_crime03.mxd"

Step 8 - Symbolize the Violent Crime data frame with ColorBrewer

Change the color symbolization for the Violent Crime (per 100k) data frame. Use ColorBrewer (linked below) to find a "sequential" color scheme to use as a template. You can modify the color scheme recommended by ColorBrewer, but it is a good place to start and it will familiarize you with using RGB color values.

ColorBrewer [ http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer.html ]

Click the sequential button in the top left of the ColorBrewer webpage. Click on the different sequential color schemes to see how they look in the hypothetical map in the webpage. Once you find a color scheme you like using ColorBrewer you can look up the RGB values (click the rgb button at the top, shown in the screenshot below).


The screenshot on the right is from the ColorBrewer webpage. The vertical bar shows the colors, and the green numbers to the right represent the RGB values that will produce that color.

For example, look at the bottom color, it is a mixture of mostly Red (152) and some Blue (67) making it dark reddish purple.

RGB is based on an 8-bit values, the values for each primary color (Red, Blue, and Green) go from 0 to 255. Look at the top color, it is pretty close to white. White would be Red 255, Blue 255, and Green 255.

Use these RGB values to symbolize one of the classified layers in ArcMap (the one you put a "1" on the end of).

Step 9 - Classify and symbolize one layer, copy the symbolization for the others

In order to use the 6 RGB color values you've decided upon with the aid of the ColorBrewer web interface, you need to first classify the data. Slocum Ch 5 Data Classification covers this topic succinctly.

Open the Layer Properties dialog for the Violent Crime (per 100k layer) go to the Symbology tab, go to Quantities -> Graduated colors, click Classify and choose an appropriate classification method, then click Apply. The colors do not matter, you will be changing these.

I used Quantiles with 6 classes. See pages 82-83 of Data Classification. You may determine there is a more appropriate classification, look at the distribution of the data. Under the Symbology tab click the Classify... button, check on Show Std. Dev. and Show Mean in order to better see the statistical distribution of the data. Write down the Break Values used for the particular classification you choose, there are 6 break values in the example shown below. You will need these later when you create a legend. Click OK, Apply, and OK to close the Layer Properties dialog for this data frame.

Right-click on the rectangular box next to the first value range shown in ArcMap, then click on the More Colors... button at the bottom of the color pallet to open the Color Selector.

In the Color Selector type in the RGB values you looked up using ColorBrewer.

Feel free to modify the colors as you see fit, you can move the sliders for each of the primary colors and see the resulting color shown below. I would not modify them too much, or at least wait until you see all the colors together and how the maps looks. Once you're done playing with colors, move on and classify the next data frame.

Right click on the Murder (per 100k) data frame and open the Layer Properties dialog. You will need to decide upon a suitable classification for these data values as well.

Note: Unless there is a value of 0, which there should not be, the Standard Deviation method allows only 4 (1 Std Dev) or 10 (1/2 Std Dev) Classes using these data. Thus, if you have used 6 classes as suggest previously to symbolize the Violent Crime data, this classification option is not viable because you are to copy the symbolization (color scheme) you applied to the Violent Crime data frame. Also, as another consideration, using Standard Deviations works best for data that are normally distributed. Look at the distribution, notice there is an outlier (see pages 83-84 of Data Classification).

I classified the Murder data frame using Natural Breaks (pg 84 of Data Classification). Notice the uneven distribution of the data, to make this more apparent check on Show Std. Dev. and Show Mean.

 

To apply the same color scheme you made for 6 classes in the first data frame, click the Import button in the Layer Properties dialog. In the Import Symbology dialog select the layer you've symbolized already (you conveniently put a 1 on the end of it so you could find it easily), select Just the symbols. Click OK, and then Apply.

The data are classified differently for this data frame, but the colors for each of the 6 classes are the same as those of the first data frame.

Make sure the extent shown in each of the data frame is the same. Save the map (if you haven't already). Change the Symbology for the other two data frames accordingly, you will have to determine which Classification is best suited for the data. Use the same color scheme by using Import (and select Just the symbols) as you did to copy the symbology for the Murder (per 100k) data frame from the Violent Crime (per 100k) data frame.

Step 10 - Make the maps

For choropleth maps simplicity is best. You want to maximize the "ink to information ratio" by including as little clutter as possible, but enough so that any person who looks at your map can understand what it is trying to represent.

Step 10.1 - ArcMap is not graphic design software

I would like you to gain some dexterity with ArcMap. This part of the lab will be painful, ArcMap is not like professional graphic design software (like CorelDRAW, AdobeIllustrator, Macromedia FreeHand etc). It has some weird behavior, but software dexterity is an important skill that you will have to develop sooner or later so be patient and ask for help if ArcMap is not letting you do what you want.

Create a legend for one of the data layers (all you need are the rectangles with the colors to begin with). Since you've used the same color scheme for each map, just make one legend (Insert -> Legend). Drag the legend ArcMap spits out outside the map area so you can work with it easier, zoom in around it. Right-click and do Covert it to graphics. Right-click again and ungroup it, right-click again while everything is still selected and do ungroup again. Delete everything but the rectangles with the colors, then insert the Class Breaks you wrote down.

Arrange the rectangles in sequential order horizontally from low to high. Next get the break values associated with the classifications for each of the layers that you wrote down. If you did not write them down you can get these values by opening the symbology for each layer, click the Classify button and look at the Break Values listed in the Classify dialog.

It may help to remove the guidelines (right-click on the little nubs along the edge of the rulers and choose Clear). Make sure everything lines-up nice though. Remove the neatlines around each data frame, and add a neatline around all the maps.

Step 11 - Overlay the election results

It might be helpful to switch back to View -> Map View. Add another copy of Election04_Crime03_equ to only one of the data frames, the ELECTVTE field is D or R. Change the symbology so only D and R are shown.

You're in control. Use Transparency, line files, or whatever you want to symbolize the election results for each state together with the crime for the four maps. Get a symbolization that works, then switch back to View -> Layout View and copy-paste the symbolized data layer representing the election results into the other three data frames. You will have to add a legend as you did before, convert it to graphics, ungroup and edit it.

Step 12 (optional) - Make a choropleth map of Internet Use.

You've created the data necessary to use this technique to map Internet Use as well on a global scale as part of the Proportional Symbol Mapping lab. Use those data, Countries.shp, or use the subset you've created, Europe.shp, to classify and symbolize with an appropriate color scheme. You may be tempted to combine the two methods but be warned that combining proportional symbols with the choropleth technique can be tricky. Feel free to combine the two as an experiment, but it is easy to create misleading visual representations. Can you make the proportional symbols from pie charts showing some of the other statistics in these data?

Here is are some interesting data (heath realted) that one could map in many different ways using this technique.

[statemaster.com] - http://www.statemaster.com/cat/hea-health

The End


recreated by jeff 5/26/05, with help from Sarah B. and Enki Yoo, last outdated 8/28/06