Lab 3: Geocoding - Matching addresses with locations


Before proceeding with the lab, read the ArcMap help files on "About Address Locators," "The Geocoding Process," and "Address Locator Settings."


You will need the following data for this lab. Right click here to download them into a zip file, and use unzip to place them in your workspace. When you are done, you should have the following files:

eastside.dbf
eastside.ixs
eastside.mxs
eastside.sbn
eastside.sbx
eastside.shp
eastside.shx

Geocoding Data. Geocoding data involves conflating tabular data with spatial data to establish the geographic locations of records in the table. As you will see in this lab, geocoding algorithms have their limitations. You will have an opportunity to refine your data and set certain preferences, but not others.


Address Matching. Address matching is a special type of geocoding algorithm that pinpoints an address on a street network. The geocoding algorithm finds an address based on certain assumptions. Typical assumptions, and those used in this lab, are that all blocks are numbered 1 to 100, that the even numbers fall on the right side, and the odd numbers fall on the left side of the street. It also assumes that the parcels are evenly spaced. We will consider situations in which these assumptions lead to good matches and poor matches.


Batch versus Interactive. Addresses are listed in many ways, and even the same address may be listed differently in two different databases. In this lab, you will set geocoding preferences to give some guidance in how the address is parsed. After the preferences are set up, geocoding is performed either automatically or interactively. As you will see, matching is not particularly good in many areas, and interactive matching is almost always necessary. In the best of circumstances only about 75% of addresses will match when geocoding automatically.


Preparing to geocode


In this exercise, we will match a small number of addresses to a very small section of the street network of Santa Barbara, California. The streets in Santa Barbara are difficult to address match because naming conventions are not standard. The use of street names such as Calle Laureles, or Alameda Padre Serra makes the use of a geocoding algorithm difficult. In the case of this exercise we will be working with streets that have single names. You will find that the use of directional prefixes (such as N., W., E., and S.) also causes problems. In this task we will match five addresses to a small street grid. You need to create an address database file address.dbf. It should look exactly like the following table. If you are not certain how to create this dbf file, refer to previous labs. Save this file.



Open ArcMap, and add the Eastside.shp layer from the data folder for this lab and the address.dbf file that you just created. In the data view you should see a simple grid of streets. Use the labeling feature that we learned about in the last lab to see the names of the streets.


Before geocoding, you need to set up a address locator. A address locator will specify the type and amount of information to match addresses that is available in your base layer.


Address Locators are created in ArcCatalog. Open ArcCatalog, find the Address Locators folder, and select Create New Address Locator.


Lab3 image


When you Create New Address Locator you need to select the geocoding style that will be used. The style used by our data is US Streets, and since our street data is in shapefile format we select "US Streets (File)." Click OK.


Lab_3 Image2


In the Reference data, make sure the eastside.shp file from the data folder for this lab is selected. Check that the Fields match those shown below. Set the default spelling sensitivity to 60.


4 JPG


You should now have set up an address locator. It will be located in the Address Locators folder. The address locator that I created is named "ting.New Address Locator" Your address locator will be named after your login name on your computer - for instance your address locator might be "guest.New Address Locator."


Lab 3 Image 3


Geocoding addresses, part I


Now that we have created a table of addresses to geocode and have set up a address locator, we can locate the addresses on the map.


In ArcMap, right click on the address.dbf file and select Geocode Addresses. Add your address locator and click OK.


Lab 3 Image 4



Check the address table and input fields, and name the output shapefile. Click OK.



Question: When the Review/Rematch Addresses window pops up, what are the results of your matching? Is this what you expected? Why?


From the Review/Rematch window, select Geocoding Options. The Geocoding Options window should pop up. This window allows you to set address matching parameters.


Question: By looking at the Matching Options in the Geocoding Options window, and from reviewing the help files, you should be able to speculate on what the geocoding algorithm does. What do you think the scores mean?


In the Matching Options, reset Spelling Sensitivity to 40, Minimum match score to 30 and Minimum candidate score to 11. Select OK. When the Review/Rematch Address window pops up, change the Rematch Criteria to All Addresses, and click on Match Automatically. Match Automatically will attempt to re-match your addresses with the new sensitivity values that you set. Experiment with modifying the three algorithm parameters until you have some partial or good matches. Do not select the Done button.


Question: List three different sets of parameters that you tried (Spelling Sensitivity, Minimum match score, Minimum candidate score), as well as the Match rate and Match quality for each of the three sets of parameters.


Question: Did you get any Good Matches?


Now set the three parameters to 18, 14, and 11 respectively. Re-match the addresses with these new values.

Question: How do you interpret the fact that you had to reduce the parameters to such low levels to get even partial matches?


You should now have 5 partial matches. In the next step we will use Interactive Re-match to complete the process of address matching. Do not select the Done button. Now that you have match candidates, you will use the Match Interactively option to select the correct streets. Select Match Interactively. The Interactive Review window will pop up.



All of the addresses will appear in the address frame and the candidates for matching will appear in table at the bottom left. You may now examine the candidates for matching. Select a candidate by left clicking on it. It will light up in blue. If there is more than one candidate, after you select the appropriate candidate, click on the Match button. If there is only one candidate you do not need to "Match" to this candidate. After you have interactively re-matched all of the records, select Close. The Review/Rematch Address window will return. Select Done. Your view will return with a new layer with points for your geocoded points. Use the labeling feature to place the addresses in the view window. You may need to change some of the labeling options in the layer's properties window (e.g., change the field used for labeling to the Address field).



Print a copy of your view and turn it in to your TA with your lab.


Question: Do you think the locations of the addresses look correct? How would you test the accuracy of this algorithm? If you had matched 10,000 addresses, how would you test the accuracy?


Question: If you were told that street addresses in Santa Barbara are numbered from 1 to 56 rather than 1 to 100, what do you think would be the effect on the matching algorithm?


Geocoding addresses, part II


Now we will look at the data table for the Eastside data set. The data set that you were given to work with was not perfectly designed for the address matching algorithm in ArcGIS. We will now fix the data table for Eastside.shp. Open the Attribute table for the Eastside shapefile.


Click on the Options button and select Add Field.



Add the following fields:


Name

Type

Width

Dir

Text

3

Sur_type

Text

5



Now we will edit the data table. Turn on the editor toolbar and start editing. Fill in each record in the new Dir field with the street direction. For example, if the Street segment name for a record is E Valerio St., fill in the Dir attribute for that record with E. Sur_type can be edited in the same way for each record. For the above example, E Valerio St., fill in the Sur_type with St. Now you will edit the Name field. Remove the direction prefixes (E, N, S, and W) and the type suffixes (St, Av, etc.). The Name field should now contain only the name of the street. For the example above, the field should contain Valerio. Examples of the field names and attributes are shown below.



When you are certain that your edits are correct, save your edits and stop editing.

Question: Given your earlier experience with address matching, do you think that this will improve your match rate? Do you think that you will get any good matches, or only partial matches?


Create new address locator with the updated table just as we did earlier in the lab (remember to make use of the fields we created and to update the fields and the spelling sensitivity value). Re-geocode the addresses.


Question: What happened when you used Batch Match this time? Do you need to do interactive re-matching? Do you think that it is worthwhile to fully edit and optimize your street data table prior to address matching?


Question: It is, in fact, true that the streets in Santa Barbara are numbered from 1 to 56. Do you believe that your final map reflects the true locations of your 5 addresses? Is there a way to fix this? How would you do it?


The previous question deals with the problem of variation from the standard 1 to 100 numbering of street addresses on blocks in the United States. In Santa Barbara, street blocks are numbered from 1 to 56. Using your solution to the question, make the changes needed to the database, geocode the new data, and print the new view.


Turn this new view in with your lab.


Question: Do you think that making this change is important? Can you think of a real world problem that could be caused by poor address matching?


Conclusion


In this lab you have explored issues of data accuracy and optimization for use in a geocoding. It should be clear that while the geocoding algorithm is an automated procedure there are still many features of the process which the GIS user must control effectively in order to create effective analytic results. Understanding software documentation and referring to it when needed is an important part.