LECTURE 4 - DISCRETE GEOREFERENCING
BASED ON UNIT 29 - DISCRETE GEOREFERENCING - OF THE 1990 NCGIA CORE CURRICULUM
IN GIS
Initial HTML-ization by Brian Klinkenberg, University of British Columbia
A.
INTRODUCTION
B.
STREET ADDRESS
C.
POSTAL CODE SYSTEMS
D.
US PUBLIC LAND SURVEY SYSTEM
E.
GEOLOC GRID
F.
CENSUS SYSTEMS
G.
ISSUES CONCERNING DISCRETE GEOREFERENCING
REFERENCES
DISCUSSION
OR EXAM QUESTIONS
NOTES
This lecture concludes the module on geocoding. Several important practical
issues are raised here that will be important particularly for those who
will be working with economic and demographic databases.
LECTURE 4 - DISCRETE GEOREFERENCING
A.
INTRODUCTION
-
the georeferencing methods covered so far (latitude- longitude, Cartesian,
projections from latitude/longitude to the plane) are continuous
-
this means that there is no effective limit to precision, as coordinates
are measured on continuous scales
-
will now look at discrete methods - systems of georeferencing for discrete
units on the Earth's surface
-
many of these methods are indirect
-
this means that the method provides a key or index, which can then be used
with a table to determine latitude/longitude or coordinates
-
for example: a ZIP code is an indirect georeference
-
rather than give latitude/longitude for a place directly, it provides a
unique number which can be looked up on a map if coordinates are needed
-
indirect methods rely on having standard, readily available reference data
sets, e.g. to provide the coordinates of ZIP codes
-
because these methods are indirect, it is important to consider the precision
of these systems
-
precision is related directly to the size of the discrete unit which forms
the basis of the georeferencing system
-
it's also important to consider how indirect systems are maintained - who's
responsible, who makes sure they're accurate, who authorizes changes
-
many methods of indirect or discrete georeferencing are in common use
-
following are 5 of the most common
B.
STREET ADDRESS
-
the precision of street addresses as georeferences varies:
-
is highest for apartments or houses in cities
-
is lowest for rural addresses or post office box numbers, where the address
may indicate only that the place is somewhere in the area served by the
post office
-
who's in charge?
-
municipal government
-
what happens when a street forms the border between two municipalities?
-
variation in spelling
-
addresses are used for many purposes
-
e.g. street addresses for identifying property for tax purposes, delivering
mail, emergency response
-
can one system fit all purposes
Using
addresses in GIS
-
general approach is to match address to a list of streets (called address
matching or "addmatch")
-
spelling and punctuation variations make this difficult
-
e.g., Ave. or Avenue, apartment number before or after street number
-
a failure rate of 10% is regarded as good, 40% is not uncommon. In such
cases it is necessary to find the street by hand, which may take as much
as 5 minutes per address in large cities
Method
1. identify the block containing address from table of address ranges
in each block
-
i.e., 551 B St. lies in the block running from 501 to 599
2. estimate position of house using the coordinates of the end points
-
the exact position of the house can be estimated by linear interpolation
-
i.e., 551 is roughly half way down the block
-
such estimates are crude
-
in many countries (e.g. India, Japan) addresses are not sequential along
the street, but reflect date of construction
-
if the street is curved the estimate can be improved by using intermediate
points (called shape points)
-
shape points are associated with the same information that block endpoints
have, including building numbers and other georeferences
-
databases to support addmatching exist in most industrialized countries.
-
in the US, DIME files were developed for this purpose in the late 1960s
by the Bureau of the Census, and are now being replaced by more comprehensive
TIGER files
-
there are many commercial vendors, e.g. ETAK, Navigation Technologies,
Thomas Brothers
-
many sites on the Web do this, e.g. Mapquest
Example
- Addmatch using TIGER
Problem: find the latitude and longitude of 950 West Broadway , Columbia
MO
Here are some records from that database:
12901900006711 A Broadway
W A40 600 698
601 699
6520365203
292901901902502509300930 6 7
252 553
9234550038951400 9234700038951600
12901900006712 A Broadway
W A40 700 798
701 799
6520365203
292901901902502509300930 6 7
253 554
9234700038951600 9234890038951800
12901900006713 A Broadway
W A40 800 898
801 899
6520365203
292901901902502509300930 6 7
253 556
9234890038951800 9235030038951900
12901900006714 A Broadway
W A40 900 998
901 999
6520365203
292901901902502509300930
9235030038951900 9235270038952200
Procedure:
1. search the TIGER file for Boone County, MO for features with the
name "West Broadway" or equivalent (W. Broadway, Broadway W. etc)
-
get about 30 matches for the length of W. Broadway
2. find the record that lists the address range which includes address
950:
-
record #6714 covers the block from Greenwood to West Blvd, and includes
the following data:
-
longitude 92.3503 to 92.3527 - .0001 degrees longitude is about 9m on the
ground - block is about 216m long
-
latitude 38.9519 to 38.9522
-
ZIP code 65203 on both sides
-
census tract 6 on the left side, 7 on the right
-
address ranges 900 to 998 on the left, 901 to 999 on the right
-
no shape points, so we assume the block is straight
3. determine the coordinates of number 950:
-
assume that the houses are evenly spaced along the street, and that the
full range of addresses is used (this is not necessarily a good assumption,
but it's the best that can be done without more information).
-
longitude is: 92.3503 + {(950-900) * (92.3527-92.3503) / (998- 900)} =
92.3515
-
latitude is: 38.9519 + {(950-900) * (38.9522-38.9519) / (998- 900)} = 38.9521
-
note that the results are given to the same precision as the block endpoints,
that is, to about 10m
-
we could have calculated more digits, but they would have been meaningless
given the accuracy of the inputs (the database was built from 1:24,000
mapping)
-
problems with determining georeferences by address matching:
-
cases where matching fails (10 - 40% common)
-
rural areas and box numbers where there are no street addresses
-
long blocks with uneven houses
-
street addresses do not always identify a parcel or lot, and some parcels
have many street addresses (e.g., apartments, condominiums)
-
address matching is very commonly used to determine georeferences for marketing
and retailing, health and the collection of social statistics
C.
POSTAL CODE SYSTEMS
-
postal code systems have been set up in many countries
-
these often provide a high level of spatial precision
US
ZIP Codes
-
in the US, ZIP codes are designed to assist with mail sorting and delivery
-
the codes are hierarchically nested, states are uniquely identified by
one or more sets of the first 2 numbers
-
a 5 digit ZIP code identifies the area served by a single post office
-
this gives precision of many city blocks
-
the 9 digit ZIP potentially provides a much higher level of spatial resolution,
but problems exist
-
buildings may have different codes for different floors
-
overlapping and fragmented boundaries
Problems:
-
addresses associated with a single ZIP code were developed from lists of
addresses representing postal walks, rather than from maps. Addresses were
seen as points along the streets rather than parcels of land
-
ZIP code map of Los Angeles
-
note unusual shapes of zones and boundaries
Canadian
Postal Code
-
the first 3 digits of the Canadian postal code define a Forward Sortation
Area which is a useful unit for mapping (average population around 20,000)
and is hierarchically nested within provinces
-
the full 6 digits provide resolution of a few block faces
-
files exist which allow the 6 digit code to be converted to census reporting
zones and latitude/longitude
-
similar codes are used in the UK, France's system is more like ZIPs
Problems
-
postal code systems have great potential as discrete georeferences
-
however, they have not been designed for this purpose, hence the problems
noted above
-
since their purpose is, in principle, internal to the postal system, it
is also difficult to ensure stability through time (codes frequently change)
-
however, there is great demand for statistics based on postal georeferences
because of their applications in retailing and marketing and the ease with
which they can be merged with customer account data
D.
US PUBLIC LAND SURVEY SYSTEM
-
PLSS is the basis for land surveys and legal land description over much
of the US
-
unlike the previous systems, it is designed to reference land parcels
-
because it is a comprehensive, systematic approach it is possible to use
it as a georeference
-
commonly used by agencies such as the Bureau of Land Management and the
US Forest Service, and within the oil and gas industry.
-
packages exist to convert PLSS descriptions to latitude/longitude
PLSS
References
-
begin with a surveyed Principal Meridian, several of which were laid out
as north-south baselines in the Western US
-
the area on both sides of the meridian is then blocked off in 6 mile by
6 mile areas, identified by township and range numbers
-
since this is a square grid system the township and ranges must be offset
as one moves NS along the meridians
-
the 36 square mile sections within each township are numbered from the
top in a standard order
-
each section is divided into four quartersections, and these can be further
divided if higher spatial resolution is needed, as for example in describing
the location of an oil well
-
PLSS is most effective where the simple rules were followed closely, however:
-
much of the Northeast was settled long before the advent of the PLSS
-
there are major variations in the Southwest where the PLSS runs up against
areas of early Spanish land tenure
-
errors in the early surveys have become embedded in the system and must
be replicated in packages which offer PLSS to latitude/longitude conversion
E.
GEOLOC GRID
-
an elaborate and more systematic example is provided by the GEOLOC geographical
referencing system (see Whitson and Sety, 1987), which can be used to index
every 100 acre parcel in the continental US
GEOLOC
References
-
the first level of partition consists of 2 rows and 3 columns, each partition
or tile being 25 degrees of longitude by 13 degrees of latitude
-
these tiles are ordered row by row from the top left (Pacific Northwest)
and numbered 1 to 6
-
at the next level, each tile is divided into 26 rows of one half degree
latitude and 25 columns of one degree longitude, the area covered by one
1:100,000 USGS quadrangle.
-
each of these subtitles is given a two letter designation using a letter
to represent the row (A through Z) and one to represent the column (A through
Y)
-
each subtile is divided into 4 rows and 8 columns of 7.5 minute quads,
numbered row by row from 1 to 32
-
at the next level, these are divided into 4 rows and 2 columns, designated
by assigning the letters A through H row by row
-
finally, each of these divisions is divided into 5 rows, lettered A through
E, and 10 columns numbered 0 through 9 to produce 50 cells of approximately
100 acres each
-
an example of a full designator for a 100 acre parcel (in the Los Angeles
area) is 4FG19DC6
Precision
-
hierarchically nested systems like GEOLOC, and to some extent PLSS, allow
the user to vary spatial precision depending on the application
-
4FG19 would identify a 7.5 minute quadrangle, or an area roughly 9 miles
across
-
the full 4FG19DC6 gives an area roughly 2000 ft across
F.
CENSUS SYSTEMS
-
the major source of social and economic data in many countries is the Census
-
statistics are collected and reported using a complex system of several
different types of reporting zones:
-
political or administrative units used for reporting (province, county,
city, electoral district)
-
units defined for ease of data collection (block, block group, enumeration
district) but often too small to use for data reporting due to privacy
regulations
-
units designed to be homogeneous for ease of analysis (census tract)
-
in the US the major units are:
-
block group (formerly enumeration district)
-
the smallest reporting unit, about 1000 population
-
census tract
-
primarily in large cities, about 5000 population, intended for analysis
-
Minor Civil Division (mostly on township boundaries)
-
County
-
State
-
several Web sites offer Census data in various forms
Converting
to georeferences
-
for the larger units, the main method of converting from census zone to
georeference is through boundary files, which are digitized boundaries
established for most of the major units and readily available from vendors
or the Bureau
-
for a smaller unit such as the block group (formerly ED) it is often possible
to obtain from the Census Bureau a representative point or centroid which
can be used as a georeference
-
for units with uneven population distribution the centroid may be located
in the area of highest population density
G.
ISSUES CONCERNING DISCRETE GEOREFERENCING
Hooks
-
is useful to consider how many different reference systems are related
to specific datasets
-
i.e. TIGER has street addresses, census zones and lat/long associated with
each record
-
allows linking of many different data sources
Purpose
-
many of these systems were set up for special purposes, and have only later
become the basis for general georeferencing
-
e.g. post office does not have a mandate to maintain these systems for
georeferencing purposes, therefore will only add ZIPs when mail is delivered
to the location
-
zones may change without notice or record
-
e.g. census is only updated every 10 years
-
as a result, these systems do not necessarily have "quality control" in
the georeferencing sense
-
no agency maintains a file of new addresses
Standardization
-
general purpose systems such as GEOLOC use regular divisions of the Earth's
surface, while special purpose systems tend to use irregular divisions
-
in the past, efforts have been made to impose greater regularity on discrete
georeferences
-
e.g. "gridiron" system of rectangular street networks (Washington, DC)
-
in the last century some city names were changed so that no two places
in a single state had the same name
-
introduction of the ZIP code
-
however, such standardization efforts generally are not consistent or long-term
-
rectangular street networks are no longer in fashion
-
referencing systems such as PLSS are now fairly chaotic despite simple
principles
-
ZIPs are not consistent
-
given their usefulness, is it possible to set up a single, common system
of discrete georeferencing?
REFERENCES
Strahler, A.N. and A.H. Strahler, 1987. Modern Physical Geography, 3rd
edition, Wiley, New York. Contains a thorough description of the US PLSS.
U.S. Department of Commerce, Bureau of the Census, 1988. Tiger/Line
File: Boone County, Missouri, Technical Documentation, Washington, D.C.
Whitson, J. and M. Sety, 1987. "GEOLOC Geographic Location System",
Fire Management Notes, 46:30-32.
DISCUSSION
OR EXAM QUESTIONS
1. Determine the resources available to you in geocoding street addresses
for your local area. What sources exist for obtaining (a) street index
(DIME or TIGER) files, (b) address matching software, (c) maps with address
ranges marked on streets? Estimate the time it would take to geocode 1000
addresses in this area using various combinations of these resources. What
percentages of hits and misses would you anticipate? Estimate the cost
per address which you would have to charge a sponsoring agency for such
a project.
2. Discuss the usefulness of the PLSS as a georeferencing system in
your local area. How complete is it? What local agencies or organizations
make use of the PLSS? What is its relationship to the local system of land
tenure?
3. Determine the 5 discrete georeferences described in this unit for
your own residence. What problems do you have in doing this? What is the
potential or actual precision of each method?
4. Discuss the ways in which the system of discrete georeferencing in
the US (or your own country) might be improved. What is the appropriate
level or agency of government to sponsor or undertake such an improvement?
Which existing system of georeferencing should it be based on? Who are
the potential users of such a system, and how might cost be shared?