Lecture 12

LECTURE 12: ANALYSIS (3)

1. DESCRIPTIVE SUMMARIES

Ways of capturing the properties of data sets in simple summaries

compare the mean a descriptive summary of a set of numbers

the median for ordinal data

the mode for nominal data

what's the equivalent of the mean for spatial data? the average location, the center

for points:

(weighted) average of coordinates
the balance point

the centroid for polygons:

centroid or balance point
a good place for a label

not the average of coordinates

calculation based on trapezia

compare calculation of area

in GIS "centroid" = any representative point

how many US county centroids lie outside their counties?

countries? Somalia (Ethiopia), Vietnam (Laos)

centers are valuable for tracking changes in data sets through time

e.g. the US center of population

centers are valuable as sites for services to serve dispersed populations

by minimizing distances

the most accessible point

centroid minimizes distance squared

point of minimum aggregate travel (MAT)

not the centroid

the Varignon Frame experiment

an analog computer

a digital computer must use an iterative search to find the MAT

location-allocation

search for optimum locations for any central facility

schools
fire stations
hospitals
retail stores
restaurants

dispersion around the center

mean distance from the center

standard deviation ellipse

black and white populations in Milwaukee

2. OPTIMIZATION

Using an analysis to create designs that improve, or optimize

finding the best location for a {point, line, area}
working with a community to select the best option
evaluating what-if scenarios

Spatial decision support systems (SDSS)

systems that aid in decision-making about spatial issues
where to put things?
which plan to select? simulation of effects and impacts of decisions and alternatives
scenario evaluation

Examples

site selection
logistics - Schindler Elevator
routing over terrain

3. HYPOTHESIS TESTING

Statistical significance

Drug X produced a lower rate of heart attacks than for a control group, but is the reduction significant?
is the amount more than you would expect by chance?

chance will always produce some impact

but how much is expected?

The data as a sample

drawn randomly from a much larger population of possible values
each member of the population has an equal probability of being chosen
the process of drawing a sample produces variation among samples

how much is expected by chance?
is the observed amount more than this?

Example

what is the relationship between vegetation and elevation in California?

does White Fir (WF) grow at a higher elevation than Mixed Conifer Pine (MP)?

Calveg 1977

land cover map

30m DEM of Santa Barbara area

can we generalize results from this one area to all of CA?

how to sample?

100 points

randomly located

generate in Excel

x from 0 to 92000

y from -446000 to -335000

overlay on vegetation polygons

summarize

no points in WF

10 in MP

but why 100?

why not all cells in raster?

identity, summarize

mean elevation of WF = 7911ft

mean elevation of MP = 5454ft

but what about the ranges?

hypsometric curves for each class

standard deviations

WF = 530ft based on 15361 points

5454 is 4.64 standard deviations below the mean

MP = 783ft based on 629793 points

7911 is 3.13 standard deviations above the mean

significance test

what is the chance that two samples have been drawn from the same population?

randomly and independently

are not significantly different?

2-sample t test

with samples of this size there's no doubt about the outcome

why not interpolate even more data points?

because they wouldn't be independent samples

Hypothesis testing with spatial data

what are the data a sample of?
what is the population?
e.g. what are all LA census tracts a sample of?
all US census tracts?
all possible versions of LA?

is the sample drawn independently?

Tobler's First Law of Geography

Hypothesis testing with spatial data is problematic

but is it necessary?
much GIS analysis is simply descriptive

no need to assess significance
no need to generalize to a larger population

Options

1. throw away data

so that Tobler's First Law is no longer true

cases are truly independent

samples spaced beyond the range

but this only fixes the spatial dependence problem

2. deal directly with the spatial dependence

geostatistics

spatial statistics

3. no generalization, no significance tests