2000 ESRI USER CONFERENCE
Pre-Conference Seminar
SPATIAL ANALYSIS and GIS

Michael F. Goodchild
National Center for Geographic Information and Analysis
University of California
Santa Barbara, CA 93106
805 893 8049 (phone)
805 893 3146 (FAX)
805 893 8224 (NCGIA)
good@geog.ucsb.edu

June 25, 2000


















Schedule

Four sessions:

Sunday June 25th:

8:00am - 9:45am

10:15am - 12:00pm

Lunch

1:30pm - 3.00pm

3:30pm - 5:00pm

  Instructor profile

Michael F. Goodchild is Professor of Geography at the University of California, Santa Barbara; Chair of the Executive Committee, National Center for Geographic Information and Analysis (NCGIA); Associate Director of the Alexandria Digital Library Project; and Director of NCGIA’s Center for Spatially Integrated Social Science. He received his BA degree from Cambridge University in Physics in 1965 and his PhD in Geography from McMaster University in 1969. After 19 years at the University of Western Ontario, including three years as Chair, he moved to Santa Barbara in 1988. He was Director of NCGIA from 1991 to 1997. In 1999 he was awarded an honorary doctorate by Laval University. In 1990 he was given the Canadian Association of Geographers Award for Scholarly Distinction, in 1996 the Association of American Geographers award for Outstanding Scholarship, and in 1999 the Canadian Cartographic Association’s Award of Distinction for Exceptional Contributions to Cartography; he has won the American Society of Photogrammetry and Remote Sensing Intergraph Award and twice won the Horwood Critique Prize of the Urban and Regional Information Systems Association. He was Editor of Geographical Analysis between 1987 and 1990, and serves on the editorial boards of ten other journals and book series. In 2000 he was appointed Editor of the Methods, Models, and Geographic Information Sciences section of the Annals of the Association of American Geographers. His major publications include Geographical Information Systems: Principles and Applications (1991); Environmental Modeling with GIS (1993); Accuracy of Spatial Databases (1989); GIS and Environmental Modeling: Progress and Research Issues (1996); Scale in Remote Sensing and GIS (1997); Interoperating Geographic Information Systems (1999); and Geographical Information Systems: Principles, Techniques, Management and Applications (1999); in addition he is author of some 300 scientific papers. He was Chair of the National Research Council’s Mapping Science Committee from 1997 to 1999, and is currently a member of NRC's Commission on Physical Sciences, Mathematics, and Applications. His current research interests center on geographic information science, spatial analysis, the future of the library, and uncertainty in geographic data.

For a complete CV see the NCGIA web site www.ncgia.ucsb.edu under Personnel
Other related web sites: UCSB Geography www.geog.ucsb.edu, Alexandria Digital Library alexandria.ucsb.edu


TABLE OF CONTENTS

Outline:

1. What is Spatial Analysis?

        Basic GIS data models

        GIS function descriptions

 2. Spatial Statistics

        Spatial interpolation

        Exploratory spatial analysis

3. Spatial Interaction Models

4. Spatial Dependence

5. Spatial Decision Support

        Spatial search

        Districting


What is Spatial Analysis?

GIS is designed to support a range of different kinds of analysis of geographic information: techniques to examine and explore data from a geographic perspective, to develop and test models, and to present data in ways that lead to greater insight and understanding. All of these techniques fall under the general umbrella of "spatial analysis". The statistical packages like SAS, SPSS, S, or Systat allow the user to analyze numerical data using statistical techniques—GIS packages like ArcInfo give access to a powerful array of methods of spatial analysis.
 
 

Purpose of the Course

The course will introduce participants with some knowledge of GIS to the capabilities of spatial analysis. Each of the five major sections will cover a major application area and review the techniques available, as well as some of the more fundamental issues encountered in doing spatial analysis with a GIS.
 
 

Outline

Section 1 - What is spatial analysis? - Basic GIS concepts for spatial analysis - GIS functionality - Integrating GIS and spatial analysis - Issues of error and uncertainty:

Section 2 - Spatial statistics - Simple measures for exploring geographic information - The value of the spatial perspective on data - Intuition and where it fails - Applications in crime analysis, emergencies, incidence of disease:  Section 3 - Spatial interaction models - What they are and where they're used - Calibration and "what-if" - Trade area analysis and market penetration: Section 4 - Spatial dependence - Looking at causes and effects in a geographical context: Section 5 - Site selection - Locational analysis and location/allocation - Other forms of operations research in spatial analysis - Spatial decision support systems - Linking spatial analysis with GIS to support spatial decision-making:

 
 

SECTION 1
WHAT IS SPATIAL ANALYSIS?












Section 1 - What is spatial analysis? - Basic GIS concepts for spatial analysis - GIS functionality - Integrating GIS and spatial analysis - Issues of error and uncertainty:

What is spatial analysis?

A set of techniques for analyzing spatial data

  • used to gain insight as well as to test models
  • ranging from inductive to deductive
  • finding new theories as well as testing old ones
  • can be highly technical, mathematical
  • Definitions

    "A set of techniques whose results are dependent on the locations of the objects being analyzed"

  • move the objects, and the results change
  • e.g. move the people, and the US Center of Population moves
  • e.g. move the people, and average income does not change
  • most statistical techniques are invariant under changes of location
  • compare the techniques in SAS, SPSS, Systat etc.
  •  
    "A set of techniques requiring access both to the locations of objects and also to their attributes"
  • requires methods for describing locations (i.e. a GIS)
  • some techniques do not look at attributes
  • mapping is a form of spatial analysis?
  •  
    Is spatial analysis the ultimate objective of GIS?
     

    Some books on spatial analysis:

    Some background slides:
    Landsat image of New York area

    Indianapolis database

    Snow map of Soho, 1854

    Openshaw GAM map of NE England

    Atlantic Monthly mystery map

    Northridge earthquake epicenters

    Environmental justice in LA

    World map

    England and Wales demography

    South Wales demography

    Vandenberg service station

    Service station subsurface

    Service station plume
     
     

    How does an analyst/modeler/decision-maker work with a GIS?

    What tools exist for helping/conceptualizing/problem-solving?

    Assumption: these (analysis, modeling, decision-making) are the primary purposes of GIS technology.
     

    GIS components:
     
     
     
    Issues
    Input
    Digitizing
    Scanning
    cost
    Storage
    Data structures
    volume vs speed
    raster vs vector
    objects vs layers
    spatial vs object indexing
    Manipulation
    Analysis
    Modeling
    algorithms
    response time
    menus vs commands
    Output
    Plot
    Print
    Display
    cartographic design
    visualization

     

    The cost of input to a GIS is high, and can only be justified by the benefits of analysis/modeling/decision-making performed with the data.

  • 60 polygons per hour = $1 per polygon
  • estimates as high as $40 per polygon
  • 500,000 polygon database costs $500,000 to create using the low estimate
  • $20m using the high estimate
  • What types of analysis can justify these costs? The list of possibilities is endless
  • ESRI's ARC/INFO has over 1000 commands/functions
  •  
    How can we organize/conceptualize the possibilities?

    A geographical data model consists of the set of entities and relationships used to create a represention of the geographical world. The choices made when the world is modeled determine how the database is structured, and what kinds of analysis can be done with it. These choices occur when the data are captured in the field, recorded, mapped, digitized, and processed.

    There are two distinct ways of conceiving of the geographical world.

    In the field view, the world is conceived as a finite set of variables, each having a single value at every point on the Earth's surface (or every point in a three-dimensional space; or a four-dimensional space if time is included).

    To be represented digitally, a field must be constructed out of primitive one, two, three, or four-dimensional objects. There are six ways of representing fields in common use in GIS: Other methods can be found in environmental modeling, but not commonly in GIS.

    The field view underlies the following ESRI implementation models:

    coverage

    TIN

    grid

    but not shapefiles
    in the Arc8 Geodatabase the distinction can be implemented in object behaviors


    In the discrete object view an otherwise empty space is littered with objects, each of which has a series of attributes. Any point in space (two, three, or four dimensional) can lie in any number of discrete objects, including zero, and objects can therefore overlap, and need not exhaust the space.

    Field and discrete object views can be implemented in either raster or vector forms

    the distinction concerns how the world is conceived, and the rules governing object behavior

    a field can be represented as raster cells, points (e.g., spot heights), triangles (TIN), lines (contours), or areas (land ownership)

    in many of these cases the primitive elements are not real (cannot be located on the ground), but are artifacts of the representation
    If we ignore the field/discrete object distinction we may easily apply meaningless forms of analysis
    buffer makes sense only for discrete objects

    interpolation makes sense only for fields


    Attributes can be of several types:
     

    numeric
    alphanumeric

    quantitative
    qualitative

    nominal
    ordinal
    interval/ratio
    cyclic
     


    Spatial objects are distinguished by their dimensions or topological properties:

            points (0-cells)
            lines (1-cells)
            areas (2-cells)
            volumes (3-cells)
     

    A class of objects is a set with the same topological properties (e.g. all points) and with the same set of attributes (e.g. a set of wells or quarter sections or roads). In the Arc8 Geodatabase a class also has the same behaviors, and may inherit behaviors from other classes. A class is associated with an attribute table.

    Geodatabase introduces a consistent set of terms for primitive geometric objects


    When a class represents a field, certain rules apply to the component objects. The objects belonging to one class of area or volume objects will fill the area and will not overlap (they are space-exhausting, they partition or tesselate the space, they are planar enforced).

             the layer provides one value at every point (recall the definition of a field)

    Slide: Planar enforcement

    Spatial objects are abstractions of reality. Some objects are well-defined (e.g. road, bridge) but others are not. Objects representing a discrete entity view tend to be well-defined; objects representing a field are not.

    A topographic surface can be represented as either a TIN or a DEM.
     

    Slides: Elevation model options

    digital elevation model (raster)

    digitized contours

    triangular mesh

    TIN

    Advantages of TIN:
      Advantages of DEM: The Relational Model

    A spatial database consists of a number of classes of spatial objects with associated attribute tables.
     

    The methods used to store the attribute and locational information about the objects are not of immediate concern to the analyst/modeler.

  • In fact this object/attribute view of the database may have little in common with the actual data structures/models used by the system designer.
  •  
    The relational model allows the database to encode and represent the complex spatial relationships which exist between objects.
  • A GIS must be capable of computing these relationships through such geometrical operations as intersection.
  •  Spatial relationships include: The potential set of relationships within a complex spatial database is enormous. No system can afford to compute and store all of them in the database.
     

    A cartographic data structure stores no spatial relationships among objects.

  • Since it must compute any relationship as and when needed it is inefficient for complex spatial analyses.
  •  
    A topological data structure stores certain spatial relationships among objects. Common stored relationships are: By storing relationships the system can perform certain related operations more quickly, but the size of the database increases at the same time.
  • The optimum balance between speed and volume depends on the set of likely operations, which is determined by the area of application.
  •  
     We can easily visualize relationships as additional attributes of the relevant object classes.
  • is this necessarily desirable?
  •  
    Relations between objects

    An object pair is a combination of objects of the same or different types/classes which may have its own attributes.

  • e.g. the hydrologic relationship between a spring and a sink may have attributes (direction, volume of flow, flow through time) but may not exist as a spatial object itself.
  • The ability to generate object pairs, give them attributes and include them in analysis is an important component of a full GIS.
     
     

    Examples of object pairs:


    Object pairs in ESRI products

    turntable (link-link pairs)

    distance matrix (first object, second object, distance)

    association class in UML

    attributed relationship class in Geodatabase

    Visio example


    Example: Data Model for Traffic Routing

    What are the essential components of a data model for route planning in a complex street network?

    Data modeling examples

    1. Design a database to capture and analyze data on recreational fishing in the Scottish Highlands, to support decision-making by the tourist industry and regulatory agencies. The database should be able to represent the following:

    2. Design a database to support analysis and modeling of shoreline erosion on the Great Lakes. It is necessary to represent conditions and processes transverse to the shoreline in much more detail than variation parallel to the shoreline.

    3. Design a database to support water resource analysis and planning for complex hydrographic networks that include streams, rivers, lakes and reservoirs.


    GEOGRAPHIC INFORMATION SYSTEM FUNCTION DESCRIPTIONS
     

    A. BASIC SYSTEM CAPABILITIES

    A1 Digitizing (di)

    Digitizing is the process of converting point and line data from source documents to a machine-readable format.

    A2 Edgematching (ed)

    Edgematching is the process of joining lines and polygons across map boundaries in creation of a "seamless" database.† The join should be topological as well as graphic, that is, a polygon so joined should become a single polygon in the data base, a line so joined should become a single line segment.

    A3 Polygonization (po)

    Polygonizing is the process of connecting together arcs ("spaghetti") to form polygons.

    A4 Labelling (la)

    This process transfers labels describing the contents (attributes) of polygons, and the characteristics of lines and points, to the digital system.† This input of labels must not be confused with the process of symbolizing and labelling output described below.

    A5 Reformatting digital data for input from other systems (rf)

    Data previously digitized are made accessible through an interface or converted by software to the system format, and made to be topologically useful as well as graphically compatible.

    A6 Reformatting for output to other systems (ro)

    This function is the inverse of the previous one. Internal data is reformatted to meet the requirements of other systems or standards.

    A7 Data base creation and management (db)

    Data is typically digitized from map-sheets, and may be edgematched. The creation of a true "seamless" database requires the establishment of a map sheet directory, and may include tiling to partition the database.

    A8 Raster/vector conversion (rv)

    The ability to convert data between vector and raster forms with grid cell size, position and orientation selected by the user.

    A9 Edit and display on input (ei)

    This function allows continuous display and editing of input data, usually in conjunction with digitizing.

    A10 Edit and display on output (eo)

    The ability to preview and edit displays before creation of hard copy maps.

    A11 Symbolizing (sy)

    To create high quality output from a GIS, it is necessary to be able to generate a wide variety of symbols to replace the primitive point, line and area objects stored in the database.

    A12 Plotting (pl)

    Creation of hard copy map output.

    A13 Updating (up)

    Updating of the digital data base with new points, lines, polygons and attributes.

    A14 Browsing (br)

    Browse is used to search the data base to answer simple locational queries, and includes pan and zoom.
     
     

    B. DATA MANIPULATION AND ANALYSIS FUNCTIONS

    B1 Create lists and reports (cl)

    This is the ability to create lists and reports on objects and their attributes in user-defined formats, and to include totals and subtotals.

    B2 Reclassify attributes (ra)

    Reclassification is the change in value of a set of existing attributes based on a set of user specified rules.

    B3 Dissolve lines and merge attributes (dm)

    Boundaries between adjacent polygons with identical attributes are dissolved to form larger polygons.

    B4 Line thinning and weeding (lt)

    This process is used to reduce the number of points defining a line or set of lines to a user defined tolerance.

    B5 Line smoothing (ls)

    Automatically smooth lines to a user-defined tolerance, creating a new set of points (compare B4).

    B6 Complex generalization (cg)

    Generalization which may require change in the type of an object, or relocation in response to cartographic rules.

    B7 Windowing (wi)

    The ability to clip features in the database to some defined polygon.

    B8 Centroid calculation and sequential numbering (cn)

    Calculate a contained, representative point in a polygon and assign a unique number to the new object.

    B9 Spot heights (sh)

    Given a digital elevation model, interpolate the height at any point.

    B10 Heights along streams (hs)

    Given a digital elevation model and a hydrology net, interpolate points along streams at fixed increments of height.

    B11 Contours (isolines) (ci)

    Given a set of regularly or irregularly spaced point values, interpolate contours at user-specified intervals.

    B12 Elevation polygons (ep)

    Given a digital elevation model, interpolate contours of height at user-specified intervals.

    B13 Watershed boundaries (wb)

    Given a digital elevation model and a hydrology net, interpolate the position of the watershed between basins.

    B14 Scale change (sc)

    Perform the operations associated with change of scale, which may include line thinning and generalization.

    B15 Rubber sheet stretching (rs)

    The ability to stretch one map image to fit over another, given common points of known locations.

    B16 Distortion elimination (de)

    The ability to remove various types of systematic distortion generated by different input methods.

    B17 Projection change (pc)

    The ability to transform maps from one map projection to another.

    B18 Generate points (gp)

    The ability to generate points and insert them in the database.

    B19 Generate lines (gl)

    The ability to generate lines and insert them in the database.

    B20 Generate polygons (ga)

    The ability to generate polygons and insert them in the database.

    B21 Generate circles (gc)

    The ability to generate circles defined by center point and radius.

    B22 Generate grid cell nets (gg)

    The ability to generate a network of grid cells given a point of origin, grid cell dimension and orientation.

    B23 Generate latitude/longitude nets (gn)

    The ability to generate graticules for a variety of map projections.

    B24 Generate corridors (gb)

    This process generates corridors of given width around existing points, lines or areas.

    B25 Generate graphs (gr)

    Create a graph illustrating attribute data by symbols, bars or fitted trend line.

    B26 Generate viewshed maps (gv)

    Given a digital elevation model and the locations of one or more viewpoints, generate polygons enclosing the area visible from at least one viewpoint.

    B27 Generate perspective views (ge)

    From a digital elevation model, generate a three-dimensional block diagram.

    B28 Generate cross sections (cs)

    Given a digital elevation model, show the cross-section along a user-specified line.

    B29 Search by attribute (sa)

    The ability to search the data base for objects with certain attributes.

    B30 Search by region (sr)

    The ability to search the data base within any region defined to the system.

    B31 Suppress (su)

    The ability to exclude objects by attribute (the converse of selecting by attribute).

    B32 Measure number of items (mi)

    The ability to count the number of objects in a class.

    B33 Measure distances along straight and convoluted lines (md)

    The ability to measure distances along a prescribed line.

    B34 Measure length of perimeter of areas (mp)

    The ability to measure the length of the perimeter of a polygon.

    B35 Measure size of areas (ma)

    The ability to measure the area of a polygon.

    B36 Measure volume (mv)

    The ability to compute the volume under a digital representation of a surface.

    B37 Calculate - arithmetic (ca)

    The ability to perform arithmetic, algebraic and Boolean calculations separately and in combination.

    B38 Calculate bearings between points (cb)

    The ability to calculate the bearing (with respect to True North) from a given point to another point.

    B39 Calculate vertical distance or height (ch)

    Given a digital elevation model, calculate the vertical distance (height) between two points.

    B40 Calculate slopes along lines (gradients) (al)

    The ability to measure the slope between two points of known height and location or to calculate the gradient between any two points along a convoluted line which contains two or more points of known elevation.

    B41 Calculate slopes of areas (sl)

    Given a digital elevation model and the boundary of a specified region (e.g., a part of a watershed), calculate the average slope of the region.

    B42 Calculate aspect of areas (aa)

    Given a digital elevation model and the boundary of a specified region, calculate the average aspect of the region.

    B43 Calculate angles and distances along linear features (ad)

    Given a prescribed linear feature, generalize its shape into a set of angles and distances from a start point, at user-set angular increments, and constrained to any known points along the linear feature.

    B44 Subdivide area according to a set of rules (sb)

    Given the corner points of a rectangular area, topologically subdivide the area into four quarters.

    B45 Locations from traverses (lo)

    Given a direction (one of eight radial directions) and distance from a given point, calculate the end point of the traverse.

    B46 Statistical functions (sf)

    The ability to carry out simple statistical analyses and tests on the database.

    B47 Graphic overlay (go)

    The ability to superimpose graphically one map on another and display the result on a screen or on a plot.

    B48 Point in polygon (pp)

    The ability to superimpose a set of points on a set of polygons and determine which polygon (if any) contains each point.

    B49 Line on polygon overlay (lp)

    The ability to superimpose a set of lines on a set of polygons, breaking the lines at intersections with polygon boundaries.

    B50 Polygon overlay (op)

    The ability to overlay digitally one set of polygons on another and form a topological intersection of the two, concatenating the attributes.

    B51 Sliver polygon removal (sp)

    The ability to delete automatically the small sliver polygons which result from a polygon overlay operation when certain polygon lines on the two maps represent different versions of the same physical line.

    B52 Line of sight (ln)

    The ability to determine the intervisibility of two points, or to determine those parts of pairs of lines or polygons which are intervisible.

    B53 Nearest neighbor search (nn)

    The ability to identify points, lines or polygons that are nearest to points, lines or polygons specified by location or attribute.

    B54 Shortest route (ps)

    The ability to determine the shortest or minimum cost route between two points or specified sets of points.

    B55 Contiguity analysis (co)

    The ability to identify areas that have a common boundary or node.

    B56 Connectivity analysis (cy)

    The ability to identify areas or points that are (or are not) connected to other areas or points by linear features.

    B57 Complex correlation (cx)

    The ability to compare maps representing different time periods, extracting differences or computing indices of change.

    B58 Weighted modelling (wm)

    The ability to assign weighting factors to individual data sets according to a set of rules and to overlay those data sets and carry out reclassify, dissolve and merge operations on the resulting concatenated data set.

    B59 Scene generation (sg)

    The ability to simulate an image of the appearance of an area from map data. The image would normally consist of an oblique view, with perspective.

    B60 Network analysis (na)

    Simple forms of network analysis are covered in Shortest route and Connectivity. More complex analyses are frequently carried out on network data by electrical and gas utilities, communications companies etc. These include the simulation of flows in complex networks, load balancing in electrical distribution, traffic analysis, and computation of pressure loss in gas pipes. In many cases these capabilities can be found in existing packages which can be interfaced to the GIS database.
     
     

    Other groupings of GIS functions:

    Berry, J.K., 1987, "Fundamental operations in computer-assisted map analysis". International Journal of GIS 1 119-36.

    Goodchild, M.F., 1988, "Towards an enumeration and classification of GIS functions". Proceedings, IGIS '87. Tomlin, Dana, 1990. Geographic Information Systems and Cartographic Modeling. Prentice Hall.
    based on a standard, semi-formal taxonomy of analytic functions for raster data
      Maguire, David, 1991. Chapter 21: The Functionality of GIS. In D.J. Maguire, M.F. Goodchild and D.W. Rhind, editors, Geographical Information Systems: Principles and Applications. Longman, London.


    Integration of GIS and Spatial Analysis

    1. Full integration (embedding)

    2. Loose coupling 3. Close coupling




     
     

    SECTION 2
    SPATIAL STATISTICS












    Section 2 - Spatial statistics - Simple measures for exploring geographic information - The value of the spatial perspective on data - Intuition and where it fails - Applications in crime analysis, emergencies, incidence of disease:

    Measures of spatial form:
     

    How to sum up a geographical distribution in a simple measure?
     

    Two concepts of space are relevant:
     

    Continuous:

  • an infinite number of locations exist
  • a means must exist to calculate distances between any pair of locations, e.g. using straight lines
  • Discrete: In discrete space places are identified as objects; in continuous space, places are identified by coordinates

    A metric is a means of measuring distance between pairs of places (in continuous space)

  • e.g. by moves in N-S and E-W directions (the Manhattan or city-block metric)
  •  
  • simple metrics can be improved using barriers or routes of lower travel cost (freeways)
  • The most useful single measure of a geographical distribution of objects is its center

    Definitions of center:

    The centroid

    The centroid is not the point for which half of the distribution is to the left, half to the right, half above and half below The centroid is not the point that minimizes aggregate distance (if the objects were people and they all traveled to the centroid, the total distance traveled would be minimum)  The definition of centrality becomes more difficult on the sphere
  • e.g. the centroid is below the surface
  • the centroid of the Canadian population in 1981 was about 90km below White River, Ontario
  • the bivariate median (defined by latitude and longitude) was at the intersection of the meridian passing through Toronto and the parallel through Montreal, near Burke's Falls, Ontario
  • the MAT point (assuming travel on the surface by great circle paths) was in a school yard in Richmond Hill, Ontario
  • What use are centers? Measures of dispersion: Potential measures: Potential is a useful measure of: Potential measures and density estimation
    think of a scatter of points representing people
    how to map the density of people?

    replace each dot by a pile of sand, superimposing the piles

    the amount of sand at any point represents the number and proximity of people

    the shape of the pile of sand is called the kernel function


     

    Measures of shape:
     



    Spatial Interpolation

    Spatial interpolation is defined as a process of determining the characteristics of objects from those of nearby objects

    The objects are most often points (sample observations) but may be lines or areas

    The attributes are most often interval-scaled (elevations) but may be of any type

    From a GIS perspective, spatial interpolation is a process of creating one class of objects from another class

    Spatial interpolation is often embedded in other processes, and is often used as part of a display process

  • e.g. to contour a surface from a set of sample points, it is necessary to use a method of spatial interpolation to determine where to place the contours among the points
  •  
    Many methods of spatial interpolation exist:
     
     

    Distance-weighted interpolation

    Known values exist at n locations i=1,...,n

    The value at a location xi is denoted by z(xi)

    We need to guess the value at location x, denoted by z(x)

    The guessed value is an average over the known values at the sample points

    Let d(xi,x) denote the distance from location x, where we want to make a guess, to the ith sample point.

    Let w[d] denote the weight given to a point at distance d in calculating the average.
     
     

    The estimate at x is calculated as:
     

    z(x) = summation over every point i (w[d(xi,x)] z(xi)) / summation over every point i (w[d(xi,x)])
     

    in other words, the average weighted by distance.
     

    The simplest kind of weight is a switch - a weight of 1 is given to any points within a certain distance of x, and a weight of 0 to all others

  • this means in effect that z(x) is calculated as the average over points within a window of a certain radius.
  •  
    Better methods include weights which are continuous, decreasing functions of distance such as an inverse square:

    w[d] = d-2

    All of the distance weighted methods (e.g IDW) share the same positive features and drawbacks. They are:

     Although distance-weighted methods underlie many of the techniques in use, they are far from ideal
     
     
     

    Polynomial surfaces

    A polynomial function is fitted to the known values - interpolated values are obtained by evaluating the function
     

    Kriging

    Most real surfaces are observed to be spatially autocorrelated - that is, nearby points have values which are more similar than distant points.

    The amount and form of spatial autocorrelation can be described by a variogram, which shows how differences in values increase with geographical separation

    Observed variograms tend to have certain common features - differences increase with distance up to a certain value known as the sill, which is reached at a distance known as the range.
     

    To make estimates by Kriging, a variogram is obtained from the observed values or past experience

    Locally-defined functions

    Some of the most satisfactory methods use a mosaic approach in which the surface is locally defined by a polynomial function, and the functions are arranged to fit together in some way

    With a TIN data structure it is possible to describe the surface within each triangle by a plane

    Another popular method fits a plane at each data point, then achieves a smooth surface by averaging planes at each interpolation point


    Exploratory Spatial Analysis

    The primary aim of spatial analysis should be to explore data from a spatial perspective, to gain insight and understanding

    This does not require techniques with great mathematical sophistication

    Many simple techniques can be devised to reveal patterns and trends in data

    In statistics, Exploratory Data Analysis was devised in the 1970s for a similar purpose

     Principles: What new dimensions does the digital environment offer? What other dimensions? An example of ESA techniques:

    John Haslett, Trinity College, Dublin (REGARD)

    What other non-intuitive aspects of spatial data can be revealed by simple ESA techniques? Hypothesis tests: Point pattern analysis There are two major options for non-random patterns: Unfortunately it is easy for this process of inference to come unstuck




     
     

    SECTION 3
    SPATIAL INTERACTION MODELS












    Section 3 - Spatial interaction models - What they are and where they're used - Calibration and "what-if" - Trade area analysis and market penetration:

    What is a spatial interaction model? Interaction is believed to be dependent on:  Let: i denote an origin object (often an area)

    j denote a destination object (a point or area)

    I*ij denote the observed interaction between i and j, measured in appropriate units (e.g. numbers of trips, flow of goods, per defined interval of time)

    Iij denote the interaction predicted by the spatial interaction model
     

    Ei denote the emissivity of the origin area i

    Aj denote the attraction of the destination area j

    Cij denote the deterrence of the trip between i and j (probably some measure of the trip length or cost)

    a a constant to be determined
     

    Then the most general form of spatial interaction model is:
     

    Iij = a Ei Aj Cij
     

    The model began life in the mid 19th century as an attempt to apply laws of gravitation to human communities - the gravity model  In any application of the model, some aspects are assumed to be unknown, and determined by calibration Measurement of the variables:

    Cij

    Ei Aj The Huff model
  • what happens when a new destination is added?
  • interactions with existing destinations are unaffected
  • assumes outflow from origins can increase without limit
  • in practice, in many applications flow from origin to existing destinations will be diverted
  • we need some form of "production constraint"
  • Huff proposed this change: Because of its production constraint, the Huff model is very popular in retail analysis
  • it is often desirable to predict how much business a new store will draw from existing ones
  • e.g. how much will a new mall draw business away from downtown?
  • Other "what if" questions: Site modeling for retail applications
  • three major areas:
  • use of the spatial interaction model
  • analog techniques
  • regression models
  •  
    Analog: Regression: Exogenous factors: Example model:
     
     

    Sales per 2-week period for convenience store:
     

    Calibration of the spatial interaction model Linearization:
  • transformations to make the right hand side of the equation a linear combination of unknowns, the left hand side known
  • Linearization of the unconstrained model:
  • take the logs of both sides:
  •  log (Iij/Ei) = log Aj - b log dij
     log (Iij/Ei) = uij1 log A1 + uij2 log A2 + ... - b log dij The objective function:




     
     

    SECTION 4
    SPATIAL DEPENDENCE












    Section 4 - Spatial dependence - Looking at causes and effects in a geographical context:

    Two concepts:

    Spatial dependence

    Geary index:
  • compares the squared differences in value between neighboring objects with overall variance in values
  • Moran index:
     

    Calculation of the Geary index of spatial autocorrelation


     
     
     

    c = 3 x 16 / (2 x 10 x 2) = 48 / 40 = 1.2
      Continuous space
  • see the discussion of variograms and Kriging
  • the term geostatistics is normally associated with continuous space, spatial statistics more with discrete space
  •  
    Measures of spatial dependence can be calculated in GIS: More extensive codes have been written using the statistical packages, e.g. MINITAB, SAS Spatial heterogeneity: Geographical brushing: Conventional analysis (analysis done aspatially, e.g. using a statistical package) assumes independence (no spatial dependence) and homogeneity (no spatial heterogeneity) An example:
     

    A related issue - the MAUP
     

    Various assumptions can be made about the underlying surface: Analysis carried out on modifiable units can produce frightening results Results of analysis using some alternative reporting zones:
      By regrouping the counties into larger regions, Openshaw and Taylor were able to generate a vast range of outcomes of the analysis: What to do?

     
     

    SECTION 5
    SPATIAL DECISION SUPPORT








    Section 5 - Site selection - Locational analysis and location/allocation - Other forms of operations research in spatial analysis - Spatial decision support systems - Linking spatial analysis with GIS to support spatial decision-making:

    Methods of analysis on networks
     

    A spatial database can be used to support the solution of a variety of network problems, including optimal location, routing and vehicle scheduling

  • these include:
  • Routing: Location: Example: Brine disposal in the Petrolia, Ontario oil field One disposal well per producer: One central facility: The location-allocation problem:
  • find locations for one or more central facilities and allocate producers to them in order to minimize the total of capital and transport costs
  •  
    Two alternatives for transport of waste brine to central facilities: pipe and truck.
     

    Pipe cost:
     

    Truck cost:
      Disposal well cost:
      Slide: Petrolia area

    Slide: Transport cost functions
     

    GIS implementation:

    Network of streets and rights of way - potential routes for trucks/pipes

    Links with attributes of length

    Nodes with attributes of volume produced - producer sites plus other potential well locations

    GIS database with nodes and links and associated attributes:

    Analysis module interacting with GIS database An analysis module supported by a GIS database provides a spatial decision support system (SDSS) tailored to specific, advanced forms of spatial analysis
     

    Location-allocation analysis module:

    1. Finds shortest paths between points on network (could be a GIS function)
    2. Define and modify model parameters
    3. Use paths and parameters to calculate transport costs
    4. Search for optimum solution using add, drop and swap heuristics
    5. Evaluate solutions and print results

     
     
     
    Option
    Number
    Facility cost
    Transport cost
    $/m3 brine
    $/m3 oil
    All producers
    14
    165,000
    0
    1.32
    26.42
    Central by truck
    2
    45,000
    395,827
    3.53
    70.59
    Central any nodes
    2
    60,000
    79,619
    1.12
    22.36
    Central any producers
    2
    60,000
    80,658
    1.13
    22.52
    Existing disposal wells
    2
    30,000
    92,031
    0.98
    19.54

     
     
    Parameter
    Value
    % pipe
    % truck
    Optimum sites
    Cost $000s
    Pipe cost A
    30
    74
    26
    4,8
    80.7
     
    60
    53
    47
    2,4,7,9
    76.3
     
    15
    87
    13
    4,8
    56.6
    Pipe life B
    10
    74
    26
    4,8
    80.7
     
    8
    67
    33
    2,4,7
    73.0
     
    6
    62
    38
    2,4,7,9
    69.4
     
    4
    47
    53
    2,4,7,9
    86.0
    Pump cost C
    2000
    74
    26
    4,8
    80.7
     
    1000
    77
    23
    2,4,7
    52.8
     
    500
    77
    23
    2,4,7
    46.8
    Well cost R
    60,000
    74
    26
    4,8
    80.7
     
    100,000
    74
    26
    4,8
    80.7
     
    40,000
    74
    26
    2,4,7,9
    54.6
    Life of well S
    4
    74
    26
    4,8
    80.7
     
    8
    74
    26
    2,4,7,9
    54.6
    Brine ratio U
    25
    74
    26
    4,8
    80.7
     
    30
    82
    18
    2,4,7
    69.0
     
    40
    90
    10
    2,4,7,9
    59.8
     
    60
    96
    4
    2,4,7
    70.1
      Other examples of complex GIS-based analysis:
     

     

    Spatial search

    Boolean search

    Search through an attribute table to find objects satisfying a set of criteria Example:

    Forest stands - area object type, non-overlapping

    Attributes:   area (reserved)   species

    age

    For each stand, compare species and age to desired criteria. Dissolve and merge boundaries between neighboring stands if both fit the criteria

    Use tables to obtain estimated yield for given species/age and area

    Generate a map showing merged groups of cuttable stands, with new IDs, plus a table showing yield for each group.

     
    Topological overlay

    Two or more coverages can be overlayed to obtain new object types with concatenated attributes. This allows Boolean search and related operations to be conducted on multiple object types, i.e. with more information available.

    Example:

    Add soil moisture information, from a separate coverage, to the criteria used to identify cuttable stands.   Buffer zone generation

    A buffer zone allows Boolean searches to include criteria based on distance

    Example:

    A stand is cuttable only if it is not less than 200m from the nearest stream/lake In many cases it is not possible to reduce all criteria to simple yes/no requirements. e.g. from those stands satisfying criteria 1 and 2, select that stand which minimizes total cost (sum of criteria 3, 4 and 5)   When all non-conditional criteria are commensurate (dollars) they can be summed.
     
     

    In many cases criteria are not commensurate and cannot be summed.

    Example

    1. Timber extraction/hauling costs - direct $ costs

    2. Environmental cost of extraction - intangible

    3. Road construction cost - $, but long-term benefits

     
    Decision Theory provides methods for determining:

    Single Utility Functions (SUFs) for each criterion

    Multiple Utility Functions (MUFs) to combine criteria.
     
     

    Both SUFs and MUFs can be determined by experimental designs involving groups of decision-makers

    Decision theoretic methods can be incorporated into GIS technology. The GIS is used to evaluate the criteria for each alternative, then to weigh them using SUFs and MUFs to arrive at a decision.
     
     
     

    A model for spatial analysis with a GIS
     

    Example of multi-stage GIS analysis

    Generation of a Recreation Opportunity Spectrum (ROS) map for a National Forest 1:24,000 quad (7.5 minute)
     
     

    Problem: generate zones and associated ROS classes for Forest Service land based on distance from transportation features, with urban exclusions.
     

    Data needed:

    D1: Roads and railways (1:24,000) - line objects

    D2: Forest Service ownership map (1:24,000) - area objects

    D3: City and town boundaries map (1:24,000) - area objects

      GIS functions:
    Reclassify attributes (B2)

    Dissolve and merge (B3)

    Generate corridors (B24)

    Topological overlay (B50)

    Measure size of areas (B35)

    Centroid calculation and sequential numbering (B8)

    Plot (A12)

    Create list and report (B1)

    Steps to make product: 1. Using the forest service ownership data, reclassify area objects as forest land / not forest land. (B2)

    2. Dissolve boundaries between polygons with the same value of the forest land / not forest land attribute, and merge polygons (B3)

    3. Using the transportation map, generate corridors 0.5 miles wide around all roads and railways. (B24)

    4. Using the transportation map, generate corridors 1.0 miles wide around all roads and railways. (B24)

    5. Topologically overlay the results of 2, 3 and 4 and concatenate the attributes, to obtain polygons with the following attributes:

     
    forest land / not forest land

    within/outside 0.5 mile corridor

    within/outside 1.0 mile corridor (B50)
     

    6. Topologically overlay the urban boundary map, and concatenate attributes, adding urban/non-urban to the list in 5. (B50)

    7. Reclassify the area objects resulting from 6 according to the following rules:
     

     Class           Criteria


    Null                 not forest land

    RMU               forest land and urban

    SPM                forest land, non-urban and within 0.5 miles of road/rail

    SPN                 forest land, non-urban, outside 0.5 mile and inside 1.0 mile corridors

    P                      forest land, non-urban, outside both 0.5 mile and 1.0 mile corridors (B2)

      8. Dissolve and merge adjacent polygons with the same class (B3)   9. Measure areas of polygons resulting from 8 (B35)

    10. Reclassify polygons of class SPM according to the following rules:

      Class        Criteria


    SPM             Areas of less than 2500 acres

    RN                Areas of more than 2500 acres (B2)

    11. Calculate centroids and sequentially number polygons (B8)

    12. Plot classified polygons with classes and numbers assigned in 11, plus roads and railways and urban areas (A12)

    13. Create a list of all polygons, with IDs, areas and classes. (B1)

     
    Summary sequence of operations:

    Initial data sets: D1, D2, D3

    1. B2 on D2 -> E1

    2. B3 on E1 -> E2

    3. B24 on D1 -> E3

    4. B24 on D1 -> E4

    5. B50 on E2, E3, E4 -> E5

    6. B50 on E5, D3 -> E6

    7. B2 on E6 -> E7

    8. B3 on E7 -> E8

    9. B35 on E8 -> E9

    10. B2 on E9 -> E10

    11. B8 on E10 -> E11

    12. A12 on E11, D1, D3

    13. B1 on E11
     
     

    Many GIS applications require complex decision rules in reclassification operations.

    e.g. finding the most cuttable stand of timber: Criterion 1.         Area of stand > 100 acres (B35)

    2.         More than 100m from stream/lake (B24)

    3.         Subrules based on slope, aspect and soil mechanics determine method of timber extraction.

    4.         Analysis of existing roads and terrain leads to estimates of costs of constructing new
                roads and hauling timber to mill

    5.         Subrules based on costs of replanting, silviculture



     

    Districting

    Characteristics of application area Types of applications Organizations Districting example Background Objectives Technical requirements Slide: City and development areas

    Current districts

  • "starbursts" show allocations of building blocks to 29 current schools (includes two special education centers)
  • note bussed areas in NW and SW - separate enclaves of recent high-density housing allocated to distant schools
  • this strategy allows an expanding city to deal with
  • dropping school populations in the core leading to an excess of capacity
  • rising school populations in the periphery but lack of funds for new school construction
  • without constantly adjusting boundaries
  •  
    Slide: Current districts

    Projections of enrollment based on current school districts

    Redistricting Proposals Slide: Projected enrolments

    Slide: Planned enrolments